Kafka

Run the command java -X and you will get a list of all -X options:
C:\Users\Admin>java -X
The -X options are non-standard and subject to change without notice. I hope
this will help you understand Xms, Xmx as well as many other things that matters
the most. :)
How is the default max Java heap size determined?
You can check the default Java heap size by:
In Windows: java -XX:+PrintFlagsFinal -version | findstr /i "HeapSize

PermSize ThreadStackSize"
In Linux: java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|
PermSize|ThreadStackSize'
What system configuration settings influence the default value?
The machine's physical memory & Java version.
Yeah Xmssize (Minimum HeapSize / InitialHeapSize) is more than 1/64th

of your physical memory.
Xmxsize (Maximum HeapSize / MaxHeapSize) is less than 1/4th of your
physical memory.
For-ex for my mac,
having 16GB ram , I am getting uintx InitialHeapSize := 268435456
{product} uintx MaxHeapSize := 4294967296 {product} , i,e Xms is 268 MB & Xmx is
4.29 GB
How to alter memory allocation pool for Kafka and ZooKeeper ?

URL: https://sleeplessbeastie.eu/2021/12/27/how-to-alter-memory-allocation-
pool-for-kafka-and-zookeeper/
kafka + how to avoid running out of disk storage ?

In Kafka, there are two types of log retention; size and time retention. The
former is triggered by log.retention.bytes while the latter by log.retention.hours.
In your case, you should pay attention to size retention that sometimes can
be quite tricky to configure. Assuming that you want a delete cleanup policy, you'd
need to configure the following parameters to
log.cleaner.enable=true
log.cleanup.policy=delete
Then you need to think about the configuration of log.retention.bytes,

log.segment.bytes and log.retention.check.interval.ms. To do so, you have to take
into consideration the following factors:
- log.retention.bytes is a minimum guarantee for a single partition of

a topic, meaning that if you set log.retention.bytes to 512MB, it means you will
always have 512MB of data (per partition) in your disk.
- Again, if you set log.retention.bytes to 512MB and
log.retention.check.interval.ms to 5 minutes (which is the default value) at any
given time, you will have at least 512MB of data + the size of data produced within
the 5 minute window, before the retention policy is triggered.
- A topic log on disk, is made up of segments. The segment size is
dependent to log.segment.bytes parameter. For log.retention.bytes=1GB and
log.segment.bytes=512MB, you will always have up to 3 segments on the disk (2
segments which reach the retention and the 3rd one will be the active segment where
data is currently written to).
Finally, you should do the math and compute the maximum size that might be
reserved by Kafka logs at any given time on your disk and tune the aforementioned
parameters accordingly. Of course, I would also advice to set a time retention
policy as well and configure log.retention.hours accordingly. If after 2 days you
don't need your data anymore, then set log.retention.hours=48.
Kafka optimal retention and deletion policy ?
Apache Kafka uses Log data structure to manage its messages. Log data
structure is basically an ordered set of Segments whereas a Segment is a collection
of messages. Apache Kafka provides retention at Segment level instead of at Message
level. Hence, Kafka keeps on removing Segments from its end as these violate
retention policies.
Apache Kafka provides us with the following retention policies -
1. Time Based Retention

Under this policy, we configure the maximum time a Segment (hence
messages) can live for. Once a Segment has spanned configured retention time, it is
marked for deletion or compaction depending on configured cleanup policy. Default
retention time for Segments is 7 days.
Here are the parameters (in decreasing order of priority) that

you can set in your Kafka broker properties file:
Configures retention time in milliseconds
log.retention.ms=1680000
Used if log.retention.ms is not set
log.retention.minutes=1680
Used if log.retention.minutes is not set
log.retention.hours=168
2. Size based Retention

In this policy, we configure the maximum size of a Log data
structure for a Topic partition. Once Log size reaches this size, it starts
removing Segments from its end. This policy is not popular as this does not provide
good visibility about message expiry. However it can come handy in a scenario where
we need to control the size of a Log due to limited disk space.
Here are the parameters that you can set in your Kafka broker
properties file:
Configures maximum size of a Log
log.retention.bytes=104857600
So according to your use case you should configure

log.retention.bytes so that your disk should not get full.
What is Kafka Performance Tuning ? (https://medium.com/p/fdee5b19505b)
There are few configuration parameters to be considered while we talk about

Kafka Performance tuning. Hence, to improve performance, the most important
configurations are the one, which controls the disk flush rate.
Also, we can divide these configurations on the component basis. So, let’s
talk about Producer first. Hence, most important configurations which need to be
taken care at Producer side are –
Compression
Batch size
Sync or Async
And, at Consumer side the important configuration is –
Fetch size
Although, it’s always confusing what batch size will be optimal when we think
about batch size. We can say, large batch size may be great to have high
throughput, it comes with latency issue. That implies latency and throughput is
inversely proportional to each other.
It is possible to have low latency with high throughput where we have to

choose a proper batch-size for that use queue-time or refresh-interval to find the
required right balance.
BEST:
https://engineering.linkedin.com/apache-kafka/how-we_re-improving-and-
advancing-kafka-linkedin
https://engineering.linkedin.com/blog/2019/apache-kafka-trillion-messages
https://engineering.linkedin.com/distributed-systems/log-what-every-software-
engineer-should-know-about-real-time-datas-unifying
References:
https://sleeplessbeastie.eu/2022/01/05/how-to-reassign-kafka-topic-
partitions-and-replicas/
https://sleeplessbeastie.eu/2021/12/20/how-to-disable-gc-logging-for-kafka-
and-bundled-zookeeper/
https://sleeplessbeastie.eu/2021/12/10/how-to-rotate-kafka-logs/
https://sleeplessbeastie.eu/2021/10/27/how-to-install-and-configure-a-kafka-
cluster-without-zookeeper/
https://sleeplessbeastie.eu/2021/10/25/how-to-install-and-configure-a-kafka-
cluster-with-zookeeper/
https://sleeplessbeastie.eu/2021/10/22/how-to-generate-kafka-cluster-id/
https://sleeplessbeastie.eu/2023/12/20/how-to-create-kubernetes-configmaps-
and-secrets/
https://sleeplessbeastie.eu/2024/01/10/how-to-list-all-resources-in-
namespace/

Kafka

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kafka

Uploaded by

Copyright:

Available Formats

Run the command java -X and you will get a list of all -X options:

How is the default max Java heap size determined?

You can check the default Java heap size by:

In Windows: java -XX:+PrintFlagsFinal -version | findstr /i "HeapSize

Yeah Xmssize (Minimum HeapSize / InitialHeapSize) is more than 1/64th

How to alter memory allocation pool for Kafka and ZooKeeper ?

kafka + how to avoid running out of disk storage ?

Then you need to think about the configuration of log.retention.bytes,

- log.retention.bytes is a minimum guarantee for a single partition of

Kafka optimal retention and deletion policy ?

Apache Kafka provides us with the following retention policies -

1. Time Based Retention

Here are the parameters (in decreasing order of priority) that

Configures retention time in milliseconds

Used if log.retention.ms is not set

Used if log.retention.minutes is not set

2. Size based Retention

Configures maximum size of a Log

So according to your use case you should configure

What is Kafka Performance Tuning ? (https://medium.com/p/fdee5b19505b)

There are few configuration parameters to be considered while we talk about

And, at Consumer side the important configuration is –

It is possible to have low latency with high throughput where we have to

You might also like