You are on page 1of 66

Building ML-Driven Streaming

Applications with Apache Kafka®


Joseph Morais
AWS Evangelist/Cloud Partner Solutions Architect, Confluent
theejosephmorais

Kanchan Waikar
Senior Partner Solutions Architect at AWS
kanchanwaikar
Joseph Morais
AWS Evangelist/Cloud Partner Solutions Architect, Confluent
theejosephmorais
Event Streaming
And why you might care

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Every Industry is Moving from Batch/Manual to
Software-Defined
Software-using Software-defined

Auto / Transport Spreadsheet-driven driver schedule Real-time ETA

Banking Nightly credit-card fraud checks Real-time credit card fraud prevention

Retail Batch inventory updates Real-time inventory management

Healthcare Batch claims processing Real-time claims processing

Oil and Gas Batch analytics Real-time analytics

Manufacturing Scheduled equipment maintenance Automated, predictive maintenance

Defense Reactive cyber-security forensics Automated SIEM and Anomaly Detection

U.S. Defense
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
5 Agencies
Becoming Software-Defined is a Competitive
Requirement

“By2020 event-sourced, real-time situational awareness


will be a required characteristic for 80% of digital
business solutions. And 80% of new business ecosystems
will require support for event processing.”
- Gartner

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Platform Requirements for Becoming Software-
Defined
Software- Built for Scalable for Transient Raw data
using Historical Data Transactional Data

Software- 1 Built for Real- 2 Scalable for 3 Persistent + 4 Enriched


defined Time Events ALL data Durable data

● State vs. change ● Non-transactional ● Mission critical apps ● Stream processing


● Historical analysis vs. data is 10x require zero data loss (SQL on RT events)
real-time operations transactional data ● Mission critical ● Context & situational
● IoT, logs, security systems require awareness (ex. ETA)
events... replay

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Only Event Streaming Has All 4 Requirements

BUILT FOR REAL- SCALABLE FOR PERSISTENT & CAPABLE OF


TIME EVENTS ALL DATA DURABLE ENRICHMENT

Databases Good for transactional applications

Messaging Good for ultra low-latency, fire-and-forget use cases

ETL Good for batch data integration

Data Warehouse Good for historical analytics and reporting

Event Streaming The Essential Data Platform for Becoming Software-Defined


(Scalable Messaging + Real-Time Data Integration + Stream Processing)

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Rise of Event Streaming

60%
Fortune 100 Companies
Using Apache Kafka
What or who is Kafka?
The Append-only log

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Both Kafkas like to write

Time

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kafka
Writers Readers
cluster

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scalability of a filesystem

• Hundreds of MB/s throughput


• Many TB per server
• Commodity hardware

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Guarantees of a Database

• Strict ordering
• Persistence

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rewind & Replay
Reset to any point in the shared narrative

Rewind & Replay


© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distributed by design

• Replication
• Fault Tolerance
• Partitioning
• Elastic Scaling

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Producing to Kafka
Time

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kafka Topics
my-topic

broker-1
my-topic-partition-0

broker-2
my-topic-partition-1

broker-3
my-topic-partition-2

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Partition Leadership and Replication
Topic1 Topic1 Topic1
partition1 partition1 partition1

Topic1 Topic1 Topic1


partition2 partition2 partition2

Topic1 Topic1 Topic1


partition3 partition3 partition3

Topic1 Topic1 Topic1


partition4 partition4 partition4

Broker 1 Broker 2 Broker 3 Broker 4

Leader Follower

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Partition Leadership and Replication - node failure
Topic1 Topic1 Topic1
partition1 partition1 partition1

Topic1 Topic1 Topic1


partition2 partition2 partition2

Topic1 Topic1 Topic1


partition3 partition3 partition3

Topic1 Topic1 Topic1


partition4 partition4 partition4

Broker 1 Broker 2 Broker 3 Broker 4

Leader Follower

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Producing to
Kafka

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Serializer
Kafka doesn’t care about what you send to it as long as it’s been
converted to a byte stream beforehand.

JSON 01001010 01010011 01001111 01001110

CSV 01000011 01010011 01010110


SERIALIZERS
Avro 01001010 01010011 01001111 01001110

Protobufs 01010000 01110010 01101111 01110100 ...

XML 01011000 01001101 01001100


(if you must)

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Record Keys and why they’re important - Ordering

Record keys determine the partition with the default Kafka partitioner

Producer Record
Topic
[Partition]
[Key] partitioner
Value

If a key isn’t provided, messages will be produced


in a round robin fashion

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Record Keys and why they’re important - Ordering

Record keys determine the partition with the default Kafka partitioner,
and therefore guarantee order for a key

Producer Record
Topic
[Partition]
AAAA partitioner
Value

Keys are used in the default partitioning algorithm:


partition = hash(key) % numPartitions

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Record Keys and why they’re important - Ordering

Record keys determine the partition with the default Kafka


partitioner, and therefore guarantee order for a key

Producer Record
Topic
[Partition]
BBBB
partitioner
Value

Keys are used in the default partitioning algorithm:


partition = hash(key) % numPartitions

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Record Keys and why they’re important - Ordering

Record keys determine the partition with the default Kafka


partitioner, and therefore guarantee order for a key

Producer Record
Topic
[Partition]
CCCC
partitioner
Value

Keys are used in the default partitioning algorithm:


partition = hash(key) % numPartitions

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Record Keys and why they’re important - Ordering
Record keys determine the partition with the default Kafka partitioner,
and therefore guarantee order for a key

Producer Record
Topic
[Partition]
DDDD
partitioner
Value

Keys are used in the default partitioning algorithm:


partition = hash(key) % numPartitions

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Producer Guarantees

Topic1 Topic1 Topic1


partition1 partition1 partition1

Producer Properties

acks=0 Broker 1 Broker 2 Broker 3

Leader Follower
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Producer Guarantees

Topic1 Topic1 Topic1


partition1 partition1 partition1

P
ack

Producer Properties

acks=1 Broker 1 Broker 2 Broker 3

Leader Follower
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Producer Guarantees

Topic1 Topic1 Topic1


partition1 partition1 partition1

Producer Properties

acks=all Broker 1 Broker 2 Broker 3


min.insync.replica=2

Leader Follower
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming
from Kafka
Consuming together is better

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming From Kafka - Single Consumer

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming From Kafka - Grouped Consumer

CC
C1

CC
C2
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming From Kafka - Grouped Consumer

C C

C C

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming From Kafka - Grouped Consumer

0 1

2 3

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming From Kafka - Grouped Consumer

0 1

2 3

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consuming From Kafka - Grouped Consumer

0,
1
3

2 3

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kafka APIs
Making Kafka an event-streaming platform

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kafka Connect and Kafka Streams

Your App

KAFKA
STREAMS

Source Sink

KAFKA KAFKA
CONNECT CONNECT

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kafka Connect: Reliable and scalable integration
of Kafka with other systems
• Centralized management and configuration • Fault tolerant and automatically load balanced
• Support for hundreds of technologies including • Extensible API
RDBMS, Elasticsearch, HDFS, S3
• Single Message Transforms
• Supports CDC ingest of events from RDBMS
• Part of Apache Kafka
• Preserves data schema

{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kafka Streams: Write standard Java apps and
microservices to process your data in real-time
• No separate processing cluster required • Perfect for small, medium, large use cases
• Develop on Mac, Linux, Windows • Fully integrated with Kafka security
• Deploy to containers, VMs, bare metal, cloud • Exactly-once processing semantics

• Powered by Kafka: elastic, scalable, distributed, • Part of Apache Kafka


battle-tested

KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");


KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stream processing with Kafka

Main
Logic

Example: Using Kafka’s Streams API for writing elastic, scalable, fault-tolerant Java and
Scala applications

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ksqlDB to the rescue

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kanchan Waikar
Senior Partner Solutions Architect at AWS
kanchanwaikar
A real-world streaming use-case
Use-case:
A non-compliance identification and notification system

Goal: Construction
work-site
Reduce accidents by identifying and correcting non-compliance

Infrastructure-scale:
Tens of thousands of surveillance cameras, hundreds of construction sites, thousands of construction
workers

Project Requirement:
Design an automated PPE non-compliance reporting system that reports incidences in matter of minutes.

Technical requirements:
Must scale up-and-down automatically based on loads (Cameras turned ON and OFF based on shift hours)

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example video: construction site surveillance

Courtesy - https://pixabay.com/videos/construction-road-excavator-worker-26239/

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary Logs
(Start) HH:mm:SSS-(End) HH:mm:SSS : Alarm/No alarm : Status Details

00:00:000-00:00:015 : No Alarm : 1 truck(s), 1 excavator(s), 1 workers


Snapshots found.

00:00:015-00:00:045 : No Alarm : 2 truck(s), 1 excavator(s), 1 workers


found.
0.0 1.5
00:00:045-00:00:060 : No Alarm : 1 truck(s), 1 excavator(s), 1 workers
found.

00:00:060-00:00:075 : No Alarm : 1 truck(s), 1 excavator(s), no workers


found.

3.0 4.5 6.0 00:00:075-00:00:090 : ALARM : 1 worker(s) wearing PPE but 0


wearing hard hats, 1 truck(s), 1 excavator(s) found.

00:00:090-00:00:105 : No Alarm : 1 truck(s), 1 excavator(s), no workers


found.

00:00:105-End : ALARM : 1 worker(s) wearing PPE but 0 wearing hard


hats, 1 truck(s), 1 excavator(s) found.
7.5 9.0 10.5

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building an ML-driven streaming application

Video/data storage ML Model ML Model deployment Serverless compute Notification mechanism


mechanism

Amazon Simple Model AWS Lambda Amazon Simple


Amazon SageMaker
Storage Service (S3) Notification Service

Event Streaming platform KSQLDB Data Visualization tool Interactive Query service Metadata storage

ksqlDB Amazon QuickSight Amazon Athena Amazon DynamoDB

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pre-trained
ML Models
Mobile
client
Safety
Administrator

Amazon Simple
Amazon Notification
SageMaker Service
Email notification
AWS Lambda

AWS Lambda Topic Alarms

ksqlDB
ksqlDB

Construction
work-site

S3 Bucket
S3 Sink S3 Bucket Amazon Athena

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML Models

AKTE - Forklift Detector GluonCV SSD Object Detector Construction Worker Detection Social Distancing Detector
Pre-trained Model Pre-trained Model Pre-trained Model Pre-trained Model

Helmet & Vest Detector for Worker Safety PPE Detector for Laboratory Safety Construction Machines Detector
Pre-trained Model Pre-trained Model Pre-trained Model

TensorIoT CV PPE Mask Detection Hard Hat Detector for Worker Safety
Pre-trained Model Pre-trained Model

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Marketplace for Machine Learning Amazon SageMaker

Curated and trusted catalog of over 400 ML model Computer


vision
packages and algorithms
Structured NLP

14 industry segments

61 sellers
Speech
Video recognition

AWS Marketplace for


Machine Learning
Simplified provisioning

Consolidated into AWS billing Audio Text

Free | Free trial | Paid subscriptions Image

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deploying ML models from AWS Marketplace
SECURE YOUR DATA BY DEPLOYING MODELS IN YOUR VPC, IN
NETWORK ISOLATION MODE

Model from
AWS
Real-time
Amazon
Marketplace
inference
SageMaker

1 2 3 4

Find and try a Subscribe Deploy models in Secure REST


model from Amazon SageMaker with: API Access Batch
AWS Marketplace transform job

• Network isolation mode (container has


no internet access)

• Endpoint configured in your VPC

• IAM policies and encryption

https://aws.amazon.com/marketplace/solutions/machine-learning/pre-trained-models

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to deploy ML
models with
Amazon Sagemaker

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Marketplace

8,000+
listings
Ÿ 1,600+ Ÿ ISVs
24
regions
Ÿ 290,000+ Ÿ
customers
1.5M+
subscriptions

Flexible consumption Quick and Helpful humans


and contract models easy deployment to support you

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Confluent Cloud
DEVELOPER OPERATOR/INFOSEC ARCHITECT/LOB

Unrestricted Developer Reliable Operations Providing Next-gen


Productivity at Scale Infrastructure
Serverless Development Reliable Operations Business Value Creation
Start for $0 | Scale Elastically | Upgrade In-place Uptime SLA | Critical Response SLA | DLP Time to market | Reduce TCO | Maximize ROI

Rich Pre-built Ecosystem Enterprise-grade Security & Compliance Enterprise-grade Security & Compliance
Connectors | Schema Registry | ksqlDB Authorization | Authentication | Encryption Authorization | Authentication | Encryption

Multi-language Development Monitoring & Visibility Event-streaming Vision


Non-Java Clients | REST Proxy Cloud Data Flow | Metrics API | Consumer Lag Leading Kafka Vision & Expertise |
App Modernization | Hybrid Strategy

Apache Kafka

Self-managed Software Freedom of Choice Fully Managed Cloud Software

Enterprise Professional
Committer-driven Expertise Training Partners
Support Services

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-time analysis
Store & Analyze
Ingest & Process Stream data into your AWS data lake or data warehouse to execute
queries on streaming data for real-time and batch analytics

ANALYZE VISUALIZE
Schema
Registry
Mobile
Amazon AWS Lake Amazon Amazon
Redshift Sink Redshift Formation Athena Elasticsearch
Web
Events
TRANSFORM
S3 Sink
IoT Amazon
Source S3
connectors
Data store
AWS & On-prem Amazon AWS Data AWS
EMR Pipeline Glue
ksqlDB

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless integrations

Serverless integration AWS serverless platform

COMPUTE DATA STORES

Schema
Registry

AWS Amazon Amazon


Lambda DynamoDB Aurora
Apps Lambda
REST Proxy Sink
& Clients
STORAGE ANALYTICS
Microservices

S3 Sink
Data stores Source
Amazon Amazon Amazon
Connectors
S3 Athena Redshift
ksqlDB

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Modernization

Bridge
Hybrid cloud streaming to enable event-driven architecture

On-prem AWS Cloud

LEGACY EDW Apps Data Streams AWS Glue Amazon Athena

S3 Sink Lake Formation SageMaker

MAINFRAME Replicator
Redshift Sink

Lambda Sink
LEGACY DB
ksqlDB Amazon Amazon
JDBC / CDC AWS Direct DynamoDB Aurora
connectors Connect

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Integrations via connectors available in

Amazon Kinesis
Amazon S3

Kafka Connect Kafka Connect

Source Sink
Kafka Cluster
syslog

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Disney+hotstar powers explosive growth
with real-time event streaming on Confluent Platform

5 times higher real-time event


streaming (10 to 15 billion messages)

Increased number of users on the


platform by 20 times

Increased number of simultaneous


users to over 25 million

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Confluent on the AWS Marketplace (free trial)

https://aws.amazon.com/marketplace

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Marketplace can help you get started
Find Buy Deploy

A breadth Through flexible With multiple


of development solutions: pricing options: deployment options:

Free trial AWS Control Tower


Pay-as-you-go AWS Service Catalog
Hourly | Monthly | Annual AWS CloudFormation
| Multi-Year (Infrastructure as Code)
Bring Your Own License (BYOL) Software as a Service (SaaS)
Seller Private Offers Amazon Machine Image (AMI)
Channel Partner Private Offers Amazon Elastic Container Service
(ECS)
Amazon Elastic Kubernetes Service
(EKS)

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Webinar summary

Think about Kafka for your event-based architectures and consider pre-build ML
models with Amazon Sagemaker to get started quickly

Look to AWS partners like Confluent to help you scale and provide support

Use AWS Marketplace to help you discovery, experiment and innovate with new tools

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

You might also like