25/1123, 11:08 Bridging the HTTP protocol to Apache Katka
€3 strimzi o
Bridging the
HTTP protocol
to Apache
Kafka
July 19, 2019 by Paolo Patierno
Apache Kafka uses a custom protocol on top of
TCP/IP for communication between applications
and the cluster. There are many client
implementations for different programming
languages, from Java to Golang, from Python to
C# and many more. These libraries simplify
development because they abstract the Apache
Kafka protocol on the wire: every client
implementation provides a simple API for sending
and receiving messages, or performing
administrative operations on the cluster.
However, there are scenarios where it is not
possible to use the clients, or indeed the native
protocol, Communicating with an Apache Kafka
cluster using a standard protocol like HTTP/1.1
eases development these scenarios. This blog
post will introduce the new HTTP - Apache Kafka
bridge that's available as part of the Strimzi 0.12
release
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! ana25/1123, 11:08
Bridging the HTTP protocol to Apache Kafka
Apache Kafka uses custom binary protocol, you
can find more information about it, here. Clients
are available for many different programming
languages, but there are many scenarios where a
standard protocol like HTTP/1.1 is more
appropriate.
For example, all the brokers in an Apache Kafka
cluster need to be accessible to the clients when
using the native Apache Kafka ones. This is due to
the fact that topics are partitioned and the
partitions are distributed on different brokers to
spread the load across the cluster. In order to
read/write from/to all the partitions, clients need
to connect to more than one broker hosting the
“teader” partition
Imagine a scenario where for security reasons we
don’t want to expose an internal, maybe company
wide, Apache Kafka infrastructure to the outside
but prefer to make it accessible through a “single”
and more controlled entry point, reducing the
attack surface.
Consider an loT solution where due to resource
constraints (CPU and memory), the embedded
device cannot open a lot of TCP/IP connections; or
think about a use case where an “always on"
connection, as the Apache Kafka protocol needs,
isn't feasible because due to network availability.
In such cases it is common for a device to open a
connection, send/receive data and then close the
connection.
Finally, if you have an application written in a
programming language without a client
implementation, then most of the time using
HTTP is much simpler.
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! aia25/1123, 11:08
Bridging the HTTP protocol to Apache Kafka
In all the above use cases, accessing the Apache
Kafka cluster through the HTTP protocol enables
more applications scenarios. This is where the
Strimzi HTTP - Apache Kafka bridge comes into
play, It is possible to run a single bridge instance
or multiple ones, depending on the scale required,
and use it as an “entry” point to the Apache Kafka
cluster.
Introduction
The HTTP - Kafka bridge allows clients to
communicate with an Apache Kafka cluster over
the HTTP/1.1 protocol. It's possible to include a
mixture of both HTTP clients and native Apache
Kafka clients in the same cluster. It is quite
common to have mobile or embedded devices
sending data through HTTP to an Apache Kafka
cluster running in the Cloud alongside backend
applications gathering and processing this data
but talking the native Apache Kafka protocol.
The bridge provides a REST API, described by an
OpenAPI specification, which exposes multiple
endpoints to allow typical Apache Kafka
operations:
sending messages to topics (including to a
specific partition)
subscribing to one or more topics (even using a
pattern) as part of a consumer group, or asking
for a specific partition assignment
receiving messages from the subscribed topics
committing offsets related to the received
messages
seeking to a specific position (or at the
beginninglend) in a topic partition
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! ansins, 1:08 Bhgng he HTTP proacolio Apache Kata
The client behaviour and the interaction with the
Apache Kafka cluster, through the bridge, is the
same which happens with a native Kafka client
but with HTTP/1.1 protocol semantics.
Each endpoint allows specific HTTP methods
(GET, POST, DELETE) to execute the above
operations
Producing
endpoints
The bridge exposes two main REST endpoints in
order to send messages:
+ /topics/{topicname}
* /topics/{topicname}/partitions/{parti
tionid}
The first one is used to send a message to a topic
topicname while the second one allows the
user to specify the partition via partitionid
Actually, even using the first endpoint the user
can specify the destination partition in the body of
the message
In order to send a message, a producer has to
connect to the bridge using an HTTP POST
request to the specific endpoint with a JSON
payload containing the value and optionally the
key and partition
The following JSON payload defines three
messages (aka records)
* the first one has key and value, so the bridge
will send it to the partition based on the hash
his time to/201071 tp igen! ae25/1123, 11:08 Bridging the HTTP protocol to Apache Katka
of the key
* the second one has the specified destination
partition and the value
* the third one just has the value, so the bridge
will apply a round robin mechanism to
determine the partition
{
"records": [
{
"key": "my-key",
"value": "my-first-value"
>
{
"value": “my-second-value",
“partition”: 3
ys
{
"value": “my-third-value"
}
]
+
The HTTP request payload is always JSON but
the message values can be JSON or binary
(encoded in base64 because you are sending
binary data in a JSON payload so encoding in a
string format is needed). After sending the
message to the cluster, the bridge replies to the
producer with an HTTP response containing an
appropriate response code (ie, 200 OK if the
messages were sent successfully) and a JSON
payload with destination partition and offset for
each message
{
"offsets": [
{
"partition": 1,
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! sina25/1123, 11:08
Bridging the HTTP protocol to Apache Katka
"offset": 7
hs
{
"partition": 3,
"offset": 45
ds
{
“partition”: 2,
"offset": 24
}
Consuming
endpoints
From a consumer perspective the bridge is much
more complex due to the nature of how
consuming messages from Apache Kafka works
in relation to consumer groups and partition
rebalancing. For this reason, before subscribing to
topics and starting to receive messages, an HTTP.
client has to “create” a corresponding consumer
on the bridge which also means joining a
consumer group. This happens through an HTTP
POST on the following endpoint and providing a
consumer configuration in the JSON payload.
/consumers/{groupid}
{
“name” : "my-consumer"
“enable. auto.commit"
false",
The bridge creates a new consumer in the group
groupid and returns to the client so called
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! eitasins, 1:08 Bhgng he HTTP proacolio Apache Kata
base_uri. which is the URL that the client has to
use for sending the subsequent requests (i.e.
subscribe, polling, ...)
{
“instance_id": "my-consumer",
:"http://my-bridge-bridge-s
From now on, the HTTP consumer will interact
with the following endpoints for subscribing to
topics, getting messages, committing offsets and
finally deleting the consumer.
+ /consumers/{groupid}/instances/ {name}
/subscription
+ /consumers/{groupid}/instances/{name}
/records
+ /consumers/{groupid}/instances/{name}
Joffsets
+ /consumers/{groupid}/instances/{name}
Subscribing to topics is done through an HTTP
POST request containing a list of topics or a topic
pattern in the JSON payload.
{
"topics": [
“topic1",
“topic2"
]
}
As a native Apache Kafka client, getting
messages means doing a “poll” operation which
in terms of HTTP protocol means doing HTTP
itpstrima!ofblog/2018/07/18ihtp-brdge-ina! itasins, 1:08 Bhgng he HTTP proacolio Apache Kata
GET requests on the relevant endpoints; the
bridge will return an array of records with topic,
key, value, partition and offset.
[
{
"topic": “topici",
"key": "key1",
"value": "value",
“partition”: @,
"offset": 2
}
{
"topic": "topic2",
"key": "key2",
"value": "value2",
“partitio
"offset": 3
}
]
After consuming messages, if the auto commit
feature is not enabled on consumer creation, it is
necessary to commit the offsets via an HTTP
POST request specifying an offsets collection
with topic, partition and required offset to commit.
"offsets": [
{
"topic": "topici",
“partition”: @,
"offset": 3
ys
{
"topic": "topic2",
"partition": 1,
"offset": 4
}
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! sina25/1123, 11:08
Bridging the HTTP protocol to Apache Katka
]
}
When the HTTP client doesn't want to consume
messages anymore, it can remove the
corresponding consumer on the bridge doing an
HTTP DELETE request.
The bridge also exposes endpoints for seeking
into a topic partition at the beginning, at the end
or at a specific offset.
+ /consumers/{groupid}/instances/{name}
/positions
+ /consumers/{groupid}/instances/ {name}
/positions/beginning
+ /consumers/{groupid}/instances/ {name}
/positions/end
In order to seek to a specific position in the
partition, the consumer must provide offset
information through the JSON payload in the
HTTP POST request. The format is the same as
used to commit the offset.
Deployment on
Kubernetes
Deploying the bridge on Kubernetes is really easy
using the new KafkaBridge custom resource
provided by the Strimzi Cluster Operator. The
following sample custom resource can be used to
deploy a bridge.
apiVersion: kafka.strimzi.io/vlalpha1
kind: KafkaBridge
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! aia25/1123, 11:08
Bridging the HTTP protocol to Apache Katka
metadata:
name: my-bridge
spec:
replicas: 1
bootstrapServers: my-cluster-kafka-bo
consumer:
auto.offset.reset: earliest
enable.auto.commit: true
producer:
delivery.timeout.ms: 300000
http:
port: 8080
The bridge has to connect to the Apache Katka
cluster. This is specified in the
bootstrapServers [property THE bridge then °
uses a native Apache Kafka consumer and
producer for interacting with the cluster. It is
possible to provide default values for the
producer and consumer configuration when the
bridge is created using the consumer. config
and producer. config blocks, The default
values can be overwritten by individual producers
and consumers as required. Finally, the
replicas property defines the number of bridge
instances to run and the http.port defines
which port the bridge will listen on for incoming
HTTP connections.
As with all the other Strimzi components such as
Kafka Mirror Maker and Kafka Connect, the bridge
can connect to the Apache Kafka cluster via TLS
and use authentication if required.
More information about the bridge configuration
are available in the documentation
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! soina25/1123, 11:08
Bhgng he HTTP proacolio Apache Kata
Once you have used Strimzi to deploy an Apache
Kafka cluster on Kubernetes, deploying a bridge
instance is as simple as running the following
command (using the example provided by the
latest Strimzi release)
kubectl apply -# examples/kafka-bridge/
In addition to deploying the bridge via a
Kubernetes Deployment , the Cluster Operator
will create a Service in order to make
accessible the bridge from HTTP clients running
in other pods in the same Kubernetes cluster. This
service has a name in the form “bridge_name-
bridge-service" (i.e, from the above snippet it will
be “my-bridge-bridge-service’).
Itis possible to create an Ingress for exposing
the service so that the bridge is accessible outside
of the Kubernetes cluster (ora Route in case you
are using OpenShift). In one of the future blog
posts, | will show how itis possible with a real
example
Conclusion
Exposing the Apache Kafka cluster to clients
using HTTP enables scenarios where use of the
native clients is not desirable. Such situations
include resource constrained devices, network
availability and security considerations.
Interaction with the bridge is similar to the native
Apache Kafka clients but using the semantics of
an HTTP REST API. The inclusion of the HTTP
Bridge in Strimzi enhances the options available
to developers when building applications with
Apache Kafka
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! sinesins, 1:08 Bhgng he HTTP proacolio Apache Kata
If you liked this blog post, star us on GitHub and
follow us on Twitter to make sure you don't miss
any of our future blog posts!
Share this POH ES
& Strimzi
¥ strimziio
Strimzi
CC-BY-4.0
‘Trademark Usage.
hitpsvistrimz!ofbiog/2018I07/18itp-bidge-inro! ana