You are on page 1of 4

Datahovel

Open Source. Observability. Cloud. Big Data. Python. Cyber Security.

A Cookiecutter for Metron Sensors

ON 08/02/201909/02/2019 / BY CONDLA / IN APACHE


METRON
What is “Cookiecutter”? Cookiecutter
(https://cookiecutter.readthedocs.io/en/latest/) is a project that helps create
boiler plate and project structures and is very famous and widely used in
both the Python and data scientist communities. But you can use
Cookiecutter for virtually anything, also for Apache Metron sensors.
Apache Metron (https://datahovel.com/2018/07/26/apache-metron-as-an-
example-for-a-real-time-data-processing-pipeline/) is,…. well read some of
the earlier blog posts, or the documentation
(https://metron.apache.org/current-book/index.html).

What is the cookiecutter-metron-sensor


Project?

The cookiecutter-metron-sensor project helps you to create sensor


configuration files and it generates deployment instructions and a
corresponding deployment script for the specific sensor. If you need all the
details check out the README.md of the project on github:

https://github.com/Condla/cookiecutter-metron-sensor
(https://github.com/Condla/cookiecutter-metron-sensor)

Usage

To use the Metron sensor cookiecutter you only need one thing installed:
cookiecutter:

pip install cookiecutter


Then you need to clone the project mentioned above and run the template.
That’s it.

git clone
https://github.com/Condla/cookiecutter-metron-
sensor
cookiecutter cookiecutter-metron-sensor

Now simply fill in the prompts to configure the cookiecutter and the lion’s
share of the work you need do to onboard a new data source
(https://datahovel.com/2018/07/18/how-to-onboard-a-new-data-source-in-
apache-metron/) is done. In the directory created you find a deployment
script as well as another README.md file that you can use to document
everything around your sensor as you go ahead and define your own
transformations and enrichments. The README.md comes with the
deployment instructions for its own specific parser.

Help to fill in the Cookiecutter prompts

While the cookiecutter-metron-sensor helps you to create and


complete all of the Metron sensor configuration files, it does not do anything
to explain what those prompts mean. You still need to read the
documentation for this. However, to assist you in your efforts I’ll walk you
through the configuration prompts and point you to the documentation, so
you understand what and why you need to configure it.

sensor_name : This will be the name of the sensor in the Metron


Management UI and determines the name of the parser Storm topology
and the name of the Kafka consumer group.
index_name : The name Metron will use to store the result of the
Metron processing pipeline in HDFS, Elastic Search or Apache Solr.
kafka_topic_name : This is the name of the Kafka topic the sensor
parser will subscribe to.
kafka_number_partitions : The number of partitions of the
Kafka topic above. It also determines the number of “ackers” and Storm
“spouts” of the sensor parser topology. If you’re not sure it’s good to
start with 2 and increase this number later on, if you see that the parser
topology builds up lag. Check the Metron performance tuning guide
(https://metron.apache.org/current-book/metron-platform/Performance-
tuning-guide.html) for more information.
kafka_number_replicas : The number of replicas of the above
Kafka topic. For data security and service availability reasons this should
be 2 or 3 .
storm_number_of_workers : The number of Storm workers you
want to launch for the sensor parser topology. Each worker is it’s own
JVM Linux process with memory assigned to it. All Storm processing
units will be distributed over these workers. For availability reasons use
2 or more workers.
storm_parser_parallelism : This will affect how fast the
sensor parser will be processing the incoming data stream. Per default
cookiecutter sets it to your choice of kafka_number_partitions
which as mentioned above affect the number of processing units reading
the stream from Apache Kafka.
batch_indexing_size : This is the batch size written to HDFS per
writer and should be determined based on the parallelism and the number
of events per second your are dealing with. Again, refer to the
performance tuning guide.
ra_indexing_size : Similar to batch_indexing_size , but
for indexing to Elastic Search or Solr.
write_to_hdfs : Select true if you want to use the batch
indexing capabilities to HDFS.
write_to_elastic_search : Select true if you want to use
the random access indexing capabilities to Elastic Search.
write_to_solr : Select true if you want to use the random
access indexing capabilities to Apache Solr.
write_to_hbase : Choose false if you want a “common”
Metron pipeline [Parsing/Transforming] –> [Enrichment] –> [Indexing]
–> [HDFS/Elastic/Solr]. Choose true if you want to onboard a stream
ingest enrichment source [Parsing/Transforming] –> [HBase].
shew_table : The HBase table name you want to write to in case you
use write_to_hbase . You can ignore this and use the defaults in
case you don’t.
shew_cf : The HBase column family name you want to write to in
case you write_to_hbase . You can ignore this and use the defaults
in case you don’t.
shew_key_columns : The name of the field you want to act as the
lookup-key for you enrichment source in case you
write_to_hbase . You can ignore this and use the defaults in case
you don’t.
shew_enrichment_type : The name of the enrichment to uniquely
identify this, when you want to use this enrichment. It will be part of the
lookup-key. Only important in case you write_to_hbase . You can
ignore this and use the defaults in case you don’t.
parser_class_name : Select one of the possible parsers. Note: As
all of these values, you can change that later in the Metron Management
UI if you are using a custom parser or can’t find you parser in this list.
grok_pattern_label : Per default this is the sensor_name in
upper case letters, but you might want to change this.
zookeeper_quorum : This is important for the deployment script so
you can create a Kafka topic. If you deployed Metron using Ambari
you’ll find this information in the Ambari UI.
elastic_user : Important for the deployment. If your Elastic Server
does not use the X-Pack for security you can leave this field empty.
elastic_master : The URL to the Elastic Search Master server
metron_user : An admin user that has access to the Metron REST
server
metron_rest : The URL to the Metron REST server.
Note: This cookiecutter-metron-sensor project is very young and work in
progress to continuously add new features with time with the aim to make it
even easier for a cyber security operator to master threat intelligence data
flows.
APACHE METRON COOKIECUTTER TEMPLATE

You might also like