Professional Documents
Culture Documents
1
The opportunity: The shift to streams & digital transformation
By 2020, 70% of
organizations will adopt
data streaming to enable
real-time analytics.
- Gartner | Nov 2016
3
More Facts & Figures
4
Vision of a Streaming Enterprise
Streaming Platform
5
What Can You Do with a Streaming Platform ?
6
The typical architecture
Databases Databases
Fraud Detection Application
Storage Storage
Interfaces Interfaces
7
Challenges abound
Difficult to handle
massive amounts
of data Diverse data sets, arriving at
an increasing rate
Many complex
data pipelines
8
Modernized architecture using Apache Kafka
Monitoring
App App Data
Search Security
Warehouse
Streams API Streams API
9
Modernized architecture using Apache Kafka
Handle any
volume of data
with ease Scale to meet demands of
diverse streams
Monitoring
App App Data
Search Security
Warehouse
Streams API Streams API
10
Our vision: from big data to stream data
Big Data was Stream Data is Stream Data can be Stream Data will be
The More the Better The Faster the Better Big or Fast (Lambda) Big AND Fast
(Kappa)
Value of Data
Volume of Data Age of Data Speed Table Batch Table Table 1 Table 2
DB DB
Consumer Tech Streaming Video, Personalized Customer Experience, Device Telemetry and Analytics
Healthcare Patient Monitoring, Pharma Substance control, Patient Relapse, Lab Results Alerts
13
Kafka Adoption Across Key Companies
Financial Services Enterprise Tech Consumer Tech
15
Confluent Enterprise
The only enterprise streaming platform
based entirely on Apache KafkaTM
16
Confluent Platform: Enterprise Streaming based on Apache Kafka™
Database Web
Log Events loT Data …
Changes Events
Confluent Platform
Data Real-time
Integration Applications
Monitoring & Administration
Hadoop Transformations
Operations
Data Compatibility
Analytics
Data Warehouse
… …
Clients Connectors
17
Confluent Completes Kafka
Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise
Kafka Connect API Advanced API for connecting external sources/destinations into Kafka
Simple library that enables streaming application development within the Kafka
Kafka Streams API
framework
Provides universal access to Kafka from any network connected device via
REST Proxy
HTTP
Central registry for the format of Kafka data – guarantees all data is always
Schema Registry
consumable
HDFS, JDBC, elasticsearch and other connectors fully certified
Pre-Built Connectors
and fully supported by Confluent
Confluent Control Center Enables easy connector management and stream monitoring
18
How do I get streams of data
into and out of my apps?
19
Apache KafkaTM Connect – Streaming Data Capture
Fault tolerant
Kafka Connect API
Connector Connector
Preserves data schema
IRC / Twitter NoSQL
Connector Connector
CDC HDFS
Integrated within Confluent Kafka Pipeline
Sources Sinks
Platform’s Control Center
20
Kafka Connect API, Part of the Apache KafkaTM Project
Connect any source to any target system with Apache Kafka
Flexible Reliable
• 40+ open source connectors available • Automated failover
• Easy to develop additional connectors • At-least-once guaranteed
• Flexible support for data types and • Balances workload between nodes
formats
Integrated Compatible
• 100% compatible with Kafka v0.9 and • Maintains critical metadata
higher
• Preserves schema information
• Integrated with Confluent’s Schema
• Supports schema evolution
Registry
• Easy to manage with Confluent Control
Center
21
Kafka Connect API Library of Connectors
Databases Datastore/File Store
*
*
* Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing have been performed.
22
New in Kafka 0.10.2: Single Message Transforms for Kafka Connect
23
Kafka Clients
Stdin/stdout
24
REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
REST Proxy
Simplifies message
creation and consumption
Schema Registry
Simplifies
Native Kafka Java administrative actions
Applications
25
How do I maintain my data
formats and ensure compatibility?
26
The Challenge of Data Compatibility at Scale
27
Schema Registry
!
Example Consumers
App 1
Serializer Elastic
App 2
Serializer HDFS
Schema
Registry
Define the expected fields for each Kafka topic Prevent backwards incompatible changes
Automatically handle schema changes (e.g. new fields) Supports multi-datacenter environments
28
How do I build stream
processing apps?
29
Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™
1 2 3
31
Architecture Example
With Kafka Streams: App-centric architecture that blends well into your existing infrastructure
Capture business Process events fast, reliably, securely with Write results
events in Kafka standard Java applications back to
Kafka
Your App 3a
Kafka
Streams API Query latest results directly from
external apps
App App
1 2
3b
32
New in Kafka 0.10.2 : Session windows in Kafka Streams API
activity terminated by a
processing-time
gap of inactivity session windowing
User sessions,
Bob grouped by
event-time
Dave session windows
event-time
33
How do I synchronize and migrate data
to and from the cloud?
35
Before: Hybrid Cloud Environments Today
DC1 AWS
DB2 KV3
App1-v2
KV2
Challenges
• Each team/department
DB1 App8 must execute their own cloud
App2 migration
App2-v2
• May be moving the same data
App7
App1 multiple times
App3 App5 • Each box represented here
KV
require development, testing,
deployment, monitoring and
maintenance
App4 DWH
DWH DB3
36
After: Cloud Synchronization and Migrations with Confluent Platform
DC1 AWS
DB2 KV3
App1-v2
KV2
Benefits
• Continuous low-latency
DB1 App2-v2 App8 synchronization
App2
•
Kafka
Kafka
Centralized manageability and
monitoring
App1 App5 App7 – Track at event level data
App3 produced in all data centers
KV
• Security and governance
– Track and control where data
comes from and who is
App4 DWH accessing it
37
How do I manage and monitor
my streaming platform at scale?
38
What Does End-to-End Mean?
39
Confluent Control Center: Cluster Health & Administration
Cluster administration
• Monitor topic configurations
40
Confluent Control Center: End-to-end Monitoring
See exactly where your messages are going in your Kafka cluster
41
Confluent Control Center: Connector Management
42
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data
delivery, high latency, Kafka connector
status, and more
• Manage alerts for different users and
applications from a web UI
• Manage alerts for different users and
applications from a web UI
User authentication
• Control access to Confluent Control
Center
43
Auto Data Balancing
Dynamically move
partitions to optimize
resource utilization and
reliability
• Easily add and remove
Before
After
Rebalanc
nodes from your Kafka e
cluster
• Rack aware algorithm
rebalances partitions
across
a cluster
• Traffic from balancer is
throttled when data
transfer occurs
44
Multi-Datacenter Replication
An easy reliable way to run Kafka across datacenters
Improve reliability
• Easily configure & maintain cross
cluster replication
Simplify management
• Centralized configuration and
monitoring
• Replicate entire cluster or a subset of
topics
• Automatic replication of topic
configuration
• Use Kafka’s SASL for Kerberos,
Active Directory
• SSL encryption between datacenters
45
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/ 46
Thank You
47