You are on page 1of 33

Logstash

Elastic Stack’s ETL engine


What is log?
• Log is defined as records of incidents or
observations.
• Log is typically made of two things:
• LOG = Timestamp + Data
Log analysis Logs are typically used for:
• Troubleshooting
• Understanding system or application
behavior
• Auditing
• Predictive analysis
No common or consistent format

Log analysis Logs are decentralized


challenges No consistent time format

Log data is unstructured


Can
Open source dynamically
data collection unify data
engine from disparate
sources

Logstash
Normalize the
Cleanse and
data into
democratize
destinations of
data
your choice
• Horizontally scalable data
processing pipeline
• Mix, match, and orchestrate
different inputs, filters, and
outputs to play in pipeline
harmony
• Community-extensible and
developer-friendly plugin
ecosystem
• Over 200 plugins
available, plus the
flexibility of creating and
contributing your own

Power of logstash
Pluggable data pipeline architecture
• Contains over 200 plugins developed by Elastic
and open source community
Extensibility
Features of • Written in JRuby
• Supports pluggable pipeline architecture
Logstash • Easily build custom plugins to meet your needs

Centralized data processing


• Data from different sources can be easily
pulled, using various input plugins
• Data can be enriched or transformed
• Sent to different / multiple destinations
Variety
• Apache, NGNIX logs, system logs, and window event logs
• Collects metrics from a wide range of application platforms over
TCP and UDP.
• Logstash can transform HTTP requests into events
• Provides webhooks for applications like Meetup, GitHub, JIRA
• Consume data from relational/NoSQL databases

Features of • queues including Kafka, RabbitMQ

Volume
Logstash • Data processing pipeline can be easily scaled horizontally
• Since Logstash 5, it supports persistent queues, thus providing
the ability to reliably process huge volumes of incoming
events/data.
Synergy
• Logstash has a strong synergy with Elasticsearch, Beats, and
Kibana, thus allowing you to build end-to-end log analysis
solutions with ease.
• Handle all types of logging data
• Easily ingest a multitude of web logs
like Apache, and application logs
like log4j for Java
• Capture many other log formats like syslog,
Logs and networking and firewall logs, and more
• Enjoy complementary secure log forwarding
Metrics capabilities with Filebeat
• Collect metrics
from Ganglia, collectd, NetFlow, JMX, and many
other infrastructure and application platforms
over TCP and UDP
• Transform HTTP requests into events
• Consume from web service firehoses
like Twitter for social sentiment analysis
• Webhook support for GitHub, HipChat, JIRA,
and countless other applications
• Create events by polling HTTP endpoints on
Web demand
• Universally capture health, performance,
metrics, and other types of data from web
application interfaces
• Perfect for scenarios where the control of
polling is preferred over receiving
• Relational database or NoSQL store with a JDBC
interface
• Unify diverse data streams from messaging
queues like Apache Kafka, RabbitMQ,
Many other and Amazon SQS
• Sensors and IoT
sources • Logstash is the common event collection
backbone for ingestion of data shipped from
mobile devices to intelligent homes,
connected vehicles, healthcare sensors etc.
Installation

Prerequisites Check java version


• Logstash requires Java 8 java -version

java version "1.8.0_65"


Java(TM) SE Runtime Environment
(build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server
VM (build 25.65-b01, mixed
mode)
Downloading and
installing Logstash

• Download latest version of logstash from


• https://www.elastic.co/downloads/logs
tash-oss
• For details follow the installation document.
Elastic components variations
• Post 6.3 Elastic components come with two variations (distributions):
• OSS distribution, which is 100% Apache 2.0
• Community License, which will have basic x-pack features that
are free to use and are bundled with Elastic components
• More details about this can be found
at https://www.elastic.co/products/x-pack/open. For this training,
we will be using the logstash-oss version, which is based on the 100%
Apache 2.0 license.
• The Community License Logstash download page is available
at https://www.elastic.co/downloads/logstash#ga-release.
• The Apache 2.0 Logstash download page is available
at https://www.elastic.co/downloads/logstash-oss#ga-release.
The Elastic developer community is quite vibrant, and newer releases
with new features/fixes get released quite often.

By the time you are in this training, the latest Logstash version might

Version
have changed.

compatablity Instructions in this training are based on Logstash version (logstash-oss)


7.x. You can click on the past releases link and download version 7.x if
you want to follow this as is.

Unlike Kibana, which requires major and minor version compatibility


with Elasticsearch, Logstash versions starting from 6.7 are compatible
with Elasticsearch 7.x. The compatibility matrix can be found
at https://www.elastic.co/support/matrix#matrix_compatibility.
The Logstash
architecture

• Inputs, Filters, and Outputs


• Input and output are required, and filters are
optional.
• Inputs create events
• Filters modify the input events
• Outputs ship them to the destination.
• Inputs and outputs support codecs, which allow you to
encode or decode the data as and when it enters or exits
the pipeline, without having to use a separate filter.

Logstash • Logstash uses in-memory bounded queues between


pipeline stages by
architecture default (Input to Filter and Filter to Output) to buffer
events.
• If Logstash terminates unsafely, any events that are
stored in memory will be lost.
• To prevent data loss, you can enable Logstash to persist
in-flight events to the disk by making use of persistent
queues.
Persistent queues
• Persistent queues can be enabled by setting
the queue.type: persisted property in
the logstash.yml file, which can be found under
the LOGSTASH_HOME/config folder.
• By default, the files are stored
in LOGSTASH_HOME/data/queue.
• You can override this by setting
the path.queue property in logstash.yml.
• By default, Logstash starts with a heap size of 1 GB.
This can be overridden by setting
the Xmsand Xmx properties in the jvm.options file,
which is found under the LOGSTASH_HOME/config
folder.
Logstash pipeline

Pipeline Three sections of file are


• The Logstash pipeline is stored input
in a configuration file that ends {
with a .conf extension }
• Each of these sections contains filter
one or more plugin {
configurations. }
output
{
}
Exercise 1

Test logstash Output


cd /usr/share/logstash/bin/ [root@osboxes bin]#
/usr/share/logstash/bin/logstash -e 'input { stdin { } }
output { stdout {} }'
logstash -e "input { stdin { } } output { stdout {} }" OpenJDK 64-Bit Server VM warning: If the number of
processors is expected to increase from one, then
you should configure the number of parallel GC
threads appropriately using -XX:ParallelGCThreads=N
Thread.exclusive is deprecated, use Thread::Mutex
WARNING: Could not find logstash.yml which is
typically located in $LS_HOME/config or
/etc/logstash. You can specify the path using --
path.settings. Continuing using the defaults
Could not find log4j2 configuration at path
/usr/share/logstash/config/log4j2.properties. Using
default config which logs errors to the console
Logstash pipeline

Pipeline #simple.conf
• A plugin can be configured by providing the name #A simple logstash configuration
of the plugin and then its settings as a key-value
pair. The value is assigned to a key using input {
the => operator. stdin { }
}
• -f flag to give file option to logstash
• -e flag to give command line config filter {
mutate {
uppercase => [ "message" ]
}
}
output {
stdout {
codec => rubydebug
}
}
Lets try simple.conf

The output will be converted to


Exercise 2 uppercase

/usr/share/logstash/bin/logstash
–f 2simple.conf
Logstash plugins
• Logstash has a rich collection of input, filter,
codec, and output plugins.
• . Plugins are available as self-contained
packages called gems, and are hosted
on RubyGems.org.
• As part of the Logstash distribution, many
common plugins are available out of the
box.
• List of plugins that are part of the current
installation
• ./logstash-plugin list
Logstash Plugins
• --verbose flag you can find out the version
information of each plugin.
• ./logstash-plugin list --verbose
• --group flag, followed by
either input, filter, output, or codec, you can
find the list of installed input, filters, output,
codecs, and plugins, respectively
• ./logstash-plugin list --group filter
• List all the plugins containing a name
fragment
• ./logstash-plugin list 'metrics'
Install plugin using install
command
./logstash-plugin install logstash-
Installing or output-email
updating
plugins Can get the latest version of the
plugin using update command
./logstash-plugin update logstash-
output-s3
An input plugin is used to configure a set of events to be
fed to Logstash.

Input plugins The plugin allows you to configure single or multiple


input sources.

Details of plugins and a list of the other available plugins


that are not part of the default distribution can be found
at https://www.elastic.co/guide/en/logstash/7.0/input-
plugins.html.
List of input
• logstash-input-azure_event_hubs • logstash-input-jms

• logstash-input-beats • logstash-input-kafka

plugins
• logstash-input-couchdb_changes • logstash-input-pipe
• logstash-input-elasticsearch • logstash-input-rabbitmq

available
• logstash-input-exec • logstash-input-redis
• logstash-input-file • logstash-input-s3
• logstash-input-ganglia • logstash-input-snmp

• logstash-input-gelf • logstash-input-snmptrap
• logstash-input-generator • logstash-input-sqs
• logstash-input-graphite • logstash-input-stdin
• logstash-input-heartbeat • logstash-input-syslog
• logstash-input-http • logstash-input-tcp
• logstash-input-http_poller • logstash-input-twitter
• logstash-input-imap • logstash-input-udp
• logstash-input-jdbc • logstash-input-unix
Output plugins are used to send data to a destination.

Output Output plugins allow you to configure single or multiple


plugins output sources.

Details of plugins and other available plugins that are not


part of the default distribution can be found
at https://www.elastic.co/guide/en/logstash/7.0/output-
plugins.html.
List of • logstash-output-cloudwatch
• logstash-output-csv
• logstash-output-pipe
• logstash-output-rabbitmq
output • logstash-output-
elastic_app_search
• logstash-output-redis
plugins • logstash-output-elasticsearch • logstash-output-s3
• logstash-output-sns
available • logstash-output-email
• logstash-output-file • logstash-output-sqs
• logstash-output-graphite • logstash-output-stdout
• logstash-output-http • logstash-output-tcp
• logstash-output-lumberjack
• logstash-output-udp
• logstash-output-nagios
• logstash-output-webhdfs
• logstash-output-null
A filter plugin is used to perform transformations on the data.

It allows you to combine one or more plugins

The order of the plugins defines the order in which the data is transformed.

Filter plugins
It acts as the intermediate section between input and output

It's an optional section in the Logstash configuration.

Details of plugins and other available plugins that are not part of the default
distribution can be found
at https://www.elastic.co/guide/en/logstash/7.0/filter-plugins.html.
List of Filter
• logstash-filter-aggregate • logstash-filter-json

• logstash-filter-anonymize • logstash-filter-kv

• logstash-filter-cidr • logstash-filter-memcached

plugins •


logstash-filter-clone

logstash-filter-csv


logstash-filter-metrics

logstash-filter-mutate

available •


logstash-filter-date

logstash-filter-de_dot

logstash-filter-dissect


logstash-filter-prune

logstash-filter-ruby

logstash-filter-sleep

• logstash-filter-dns • logstash-filter-split

• logstash-filter-drop • logstash-filter-syslog_pri

• logstash-filter-elasticsearch • logstash-filter-throttle

• logstash-filter-fingerprint • logstash-filter-translate

• logstash-filter-geoip • logstash-filter-truncate

• logstash-filter-grok • logstash-filter-urldecode

• logstash-filter-http • logstash-filter-useragent

• logstash-filter-jdbc_static • logstash-filter-uuid

• logstash-filter-jdbc_streaming • logstash-filter-xml
Codec plugins are used to encode or decode incoming or outgoing
events from Logstash.

Codecs can be used in input and output as well.

Codec plugins Input codecs render a convenient way to decode your data before it
even enters the input.

Output codecs provide a convenient way to encode your data before it


leaves the output.

Details of plugins and other available plugins that are not part of the
default distribution can be found
at https://www.elastic.co/guide/en/logstash/7.0/codec-plugins.html.
List of codec • logstash-codec-avro
• logstash-codec-cef
• logstash-codec-json
• logstash-codec-json_lines
plugins • logstash-codec-collectd • logstash-codec-line
available • logstash-codec-dots • logstash-codec-msgpack
• logstash-codec-edn • logstash-codec-multiline
• logstash-codec-edn_lines • logstash-codec-netflow
• logstash-codec-es_bulk • logstash-codec-plain
• logstash-codec-fluent • logstash-codec-rubydebug
• logstash-codec-graphite
Questions??

You might also like