You are on page 1of 43

Robust Kafka Streams

Streams Technical Deep Dive and Making your Kafka Streams


applications more resilient

1
Brief Introduction

• Worked at Confluent (Streams Team) 1.5 years


• Author Kafka Streams in Action

2
What We’ll Cover

• Concept of Topology and Sub-Topology


• Tasks, threads and state stores
• Getting Notifications
• Error Handling

3
Sample Streams Application

...
stream = streamBuilder.stream(Arrays.asList(“A”, “B”))

stream.groupByKey()
.count(Materialized.as(“count-store”))
.toStream()
.to(“output-topic”,
Produced.with(Serdes.String(), Serdes.Long()))

4
View the Topology

Topology topology = streamBuilder.build()

topology.describe().toString()

5
View the Topology
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [A, B])
--> KSTREAM-AGGREGATE-0000000001
Processor: KSTREAM-AGGREGATE-0000000001 (stores: [count-store])
--> KTABLE-TOSTREAM-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KTABLE-TOSTREAM-0000000002 (stores: [])
--> KSTREAM-SINK-0000000003
<-- KSTREAM-AGGREGATE-0000000001
Sink: KSTREAM-SINK-0000000003 (topic: output-topic)
<-- KTABLE-TOSTREAM-0000000002

6
View Graphic Topology
https://github.com/zz85/kafka-streams-viz

7
View the Topology

stream = streamBuilder.stream(Arrays.asList(“A”, “B”));

stream.groupByKey()
.count(Materialized.as(“count-store”))
.toStream()
.to(“output-topic”,
Produced.with(Serdes.String(), Serdes.Long()));

8
View the Topology

stream = builder.stream(Arrays.asList(“A”, “B”));

stream.groupByKey()
.count(Materialized.as(“count-store”))
.toStream()
.to(“output-topic”,
Produced.with(Serdes.String(), Serdes.Long()));

9
View the Topology

stream = builder.stream(Arrays.asList(“A”, “B”));

stream.groupByKey()
.count(Materialized.as(“count-store”))
.toStream()
.to(“output-topic”,
Produced.with(Serdes.String(), Serdes.Long()));

10
View the Topology

stream = builder.stream(Arrays.asList(“A”, “B”));

stream.groupByKey()
.count(Materialized.as(“count-store”))
.toStream()
.to(“output-topic”,
Produced.with(Serdes.String(), Serdes.Long()));

11
Kafka Streams Tasks

12
Updated Sample Streams Application

stream = streamBuilder.stream(“A”);
secondStream = streamBuilder.stream("B");

stream.groupByKey()
.count(Materialized.as(“count-store”))
.toStream()
.to(“output-topic”,
Produced.with(Serdes.String(), Serdes.Long()));

secondStream.filter((k,v) -> k.equals("Foo"))


.mapValues(v -> v.substring(0,3))
.to("filtered-output-topic");

13
Updated streams Example Topology

14
Kafka Streams Task Assignments

First Number: Group ID


Second Number: Partition ID

15
Tasks and Threads

16
Tasks and Threads

17
Tasks and Threads

18
Tasks and Threads

19
Tasks and Threads

20
So what does this mean?

• N tasks max of N Threads,


• Threads > Tasks; Idle threads
• Account for #core(s)
21
Tasks and State Stores

• Stores are assigned Per task

22
Task Migration with State
Standby Tasks

24
Standby Tasks

25
Standby Task Deployment Considerations

26
Standby Task Pros & Cons

27
View Task Assignments

for (ThreadMetadata threadMetadata : kafkaStreams.localThreadsMetadata()) {


LOG.info("Thread {} active {}",
threadMetadata.threadName(),
threadMetadata.activeTasks());
LOG.info("Thread {} standby {}",
threadMetadata.threadName(),
threadMetadata.standbyTasks();
}

28
The General Problem Areas
• Deserialization
• Producing
• Processing

29
Error Handling - Deserializing

props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG,
LogAndFailExceptionHandler.class.getName());

props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG,
LogAndContinueExceptionHandler.class.getName());

30
Error Handling - Producing

props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
DefaultProductionExceptionHandler.class.getName());

31
Processing Exceptions DSL
ValueMapper<String, ValueContainer> errorProneMapper = v -> {
ValueContainer valueContainer = new ValueContainer();
try {
valueContainer.addValue(v.substring(0, 5));
} catch (Exception e) {
valueContainer.setException(e);
}
return valueContainer;
};

32
Processing Exceptions DSL
Predicate<String, ValueContainer> errorPredicate =
(k, v) -> v.exception() != null;

Predicate<String, ValueContainer> successPredicate =


(k,v) -> !errorPredicate.test(k,v);

KStream<String, ValueContainer>[] branched = sourceStream.mapValues(errorProneMapper).


branch(errorPredicate, successPredicate);

int errorIndex = 0;
int successIndex = 1;

branched[errorIndex].to(errorReportTopic);
branched[successIndex].mapValues(ValueContainer::formatValue).to(outputTopic);

33
Processing Exceptions DSL
Predicate<String, ValueContainer> errorPredicate = (k, v) -> v.exception() != null;
Predicate<String, ValueContainer> successPredicate = (k,v) ->!errorPredicate.test(k,v);

KStream<String, String>[] branched = sourceStream


.mapValues(errorProneMapper)
.branch(errorPredicate,
successPredicate);

int errorIndex = 0;
int successIndex = 1;

branched[errorIndex].to(errorReportTopic);
branched[successIndex].mapValues(v -> ValueContainer::formatValue).to(outputTopic);

34
Processing Exceptions DSL
Predicate<String, ValueContainer> errorPredicate = (k, v) -> v.exception() != null;
Predicate<String, ValueContainer> successPredicate = (k,v) ->!errorPredicate.test(k,v);

KStream<String, ValueContainer>[] branched = sourceStream.mapValues(errorProneMapper).


branch(errorPredicate,successPredicate);

int errorIndex = 0;
int successIndex = 1;

branched[errorIndex].to(errorReportTopic);

branched[successIndex].mapValues(ValueContainer::formatValue)
.to(outputTopic);

35
Processing Exceptions PAPI
class ErrorProneProcessor extends AbstractProcessor<String, String> {

@Override
public void process(String key, String value) {
ValueContainer valueContainer = new ValueContainer();
try {
valueContainer.addValue(value.substring(0, 5));
context().forward(key,
valueContainer,
To.child(successProcessor));
} catch (Exception e) {
context().forward(key,
e.getMessage(),
To.child(errorSink)
}
}
}
36
Processing Exceptions PAPI
Topology topology = new Topology();

topology.addSource(sourceTopicNode, sourceTopic);

topology.addProcessor(errorProneProcessor,
ErrorProneProcessor::new,
sourceTopicNode);

topology.addProcessor(successProcessor,
SuccessProcessor::new,
errorProneProcessor);

topology.addSink(errorSink,
errorReportTopic,
errorProneProcessor);

topology.addSink(outputTopicNode, outputTopic, upperCaseProcessor);

37
Uncaught Exceptions

kafkaStreams.setUncaughtExceptionHandler(
(thread, exception) -> {
LOG.info(”It ran on my box fine!");
});

38
StateRestoreListener
kafkaStreams.setGlobalStateRestoreListener(new StateRestoreListener() {
@Override
public void onRestoreStart(TopicPartition topicPartition, String storeName, long
startingOffset, long endingOffset) {

@Override
public void onBatchRestored(TopicPartition topicPartition, String storeName, long
batchEndOffset, long numRestored) {

@Override
public void onRestoreEnd(TopicPartition topicPartition, String storeName, long
totalRestored) {

}
});

39
Kafka Streams StateListener
kafkaStreams.setStateListener((newState, oldState) -> {
if( newState == State.RUNNING &&
oldState == State.REBALANCING ) {

// Do something here

} else if ( newState == State.REBALANCING &&


oldState == State.RUNNING ) {
// Do something else
}
});

40
Wrapping Up

• Sub-topologies, tasks and threads


• Tasks and state
• Failover

41
Wrapping Up

• General Error handing


• Per-Record Error handling
• Events

42
Thanks!
Kafka Streams in Action book signing @4:15 at the Confluent
booth.
Stay in Touch!

• https://slackpass.io/confluentcommunity
• https://www.confluent.io/blog/

43

You might also like