You are on page 1of 30

Centralized Logging with

TABLE OF CONTENTS

Data collection using rsyslog

Install/upgrade. rsyslog.conf

Plugins: main input modules and their configurations

Message modifiers: using mmnormalize to parse unstructured data in a scalable way

Output modules: writing data to Elasticsearch

Tuning queues, workers and batch sizes

Using rulesets to manage multiple data flows

RainerScript: variables, conditionals, loops and lookup tables

Pipeline patterns when sending data to Elasticsearch

Example configurations

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

2
Data collection using rsyslog

Install/upgrade. Rsyslog.conf

○ rsyslog comes preinstalled on most Linux distributions as the default syslog daemon. However,
the version supplied is often years old, so it’s a good idea to upgrade
○ there are Ubuntu and RHEL/CentOS packages on the official website:
http://www.rsyslog.com/downloads/download-other/
○ for Amazon Linux, you need to change ​epel-$releasever to ​epel-6 and specify ​priority=1 in
http://rpms.adiscon.com/v8-stable/rsyslog.repo
○ if you’re running another distro, it’s likely that it contains a new version somewhere in its
repositories. For example, you can find a recent version for Debian Jessie here:
https://packages.debian.org/jessie-backports/rsyslog
○ however you end up configuring rsyslog, you’ll need to check how it interferes with your local
configuration. For example, if there’s a ​stop or ​~ action somewhere, any messages hitting that
spot will be discarded. If you have an output action after such a statement, you’ll be missing
some logs. To counter this, you have a few options:
■ Check your current configuration and make sure it works well with what you’re adding
■ Process your data (e.g. read from files, output to Elasticsearch) in a separate ruleset
(see the section on rulesets)
■ Rewrite the local configuration so it makes more sense (e.g. for many installations just
reading from the local socket and writing everything to /var/log/messages is enough)
■ Process your data with a different rsyslog process (different config, different PID)
○ many predefined rsyslog.conf files are written in the configuration format that is backwards
compatible with version 5 and earlier. This old configuration format fully support the traditional
syslogd format (and evolved from it), making it easy to migrate from syslogd to rsyslog
○ more complex configurations are difficult to read or write using this configuration format, and
you’ll be missing important features. Since version 6 there’s another configuration format (see
http://www.rsyslog.com/doc/master/rainerscript/ for all the concepts) which is more powerful and
easier to read, although more verbose. In this section, we’ll only work with this “new” (since
2012) configuration format
○ to start rsyslog, you’d do the classic ​service rsyslog start​, but if you need to debug the
configuration, you can start it in foreground:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

3
■ rsyslogd -n will stay in foreground, but won’t react to CTRL+C (you need another
terminal to kill it)
■ rsyslog -dn​ will start in debug mode and does react to CTRL+C

Plugins: main input modules and their configurations

○ data flows in rsyslog from input modules, through message modifiers to the output modules

○ first, you need to declare the modules you want to use (though some of them are builtin, such
as ​omfile​ for writing to files)

module(load="imklog") # listens for kernel logs


module(load="imuxsock") # listens to the local syslog socket

○ for some inputs, you may need to specify additional parameters when you load the module. For
example, you can use ​imtcp to listen for TCP traffic, but if you want that traffic encrypted, you
need to specify it:

module(load="imtcp" StreamDriver.AuthMode="anon" StreamDriver.Mode="1")

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

4
○ then, some inputs need to be started with their own parameters. For example, the UDP/TCP
inputs need ports (as you can have multiple such ​input​ directives, to listen on multiple ports):

input(type="imudp" port="514")

○ besides the kernel, socket and network inputs, you’re likely to use the file input. Like with the
network ones, you need to load it first:

module(load="imfile")

○ then, for each file you want to tail, you’d need an ​input directive. Besides the file name, you’ll
need to specify a syslog tag, which can be later used for filtering:

input(type="imfile"
File="/opt/logs/apache.log"
Tag="apache:"
)

○ like Logstash and Filebeat, rsyslog will remember where it left off reading the file. It writes that
down in a ​state file​, which is stored in ​workDirectory ​(which defaults to “/”)

global(
workDirectory="/var/run/"
)

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

5
Message modifiers: using mmnormalize to parse unstructured data in a scalable way

○ message modifiers are about parsing or otherwise changing your data. rsyslog does have a
mmgrok​ module (more details on it below), which allows you to define grok patterns like in
Logstash/Ingest, but it also comes with a grammar-based parser (like Logstash’s Dissect),
called ​mmnormalize
○ based on ​liblognorm​, mmnormalize builds parse trees out of specialized parsers (which make
up your rules). This makes liblognorm much faster than grok, especially as you add more rules
(effectively O(1) instead of O(n) for grok)
○ the downside is that you’ll lose some of the flexibility offered by regular expressions. You can
still use regular expressions with liblognorm (you’d need to set ​allow_regex​ to ​on​ when loading
mmnormalize) but then you’d also lose a lot of the benefits that come with the parse tree
approach

○ to parse logs, you have to load mmnormalize first:

module(load="mmnormalize")

○ then you need to define an action that runs mmnormalize on each log. In that action, you point
to a configuration file which contains liblognorm rules:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

6
action(type="mmnormalize"
rulebase="/opt/rsyslog/apache.rb"
)

○ The rule for parsing apache logs can look like this (note the similarity to grok in its structure):

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%]


"%verb:word% %request:word% HTTP/%httpversion:float%" %response:number%
%bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

○ if you need to write new rules, you’ll probably want to check the liblognorm reference:
https://github.com/rsyslog/liblognorm/blob/master/doc/configuration.rst
○ when trying out rules, you can use the ​lognormalizer​ tool that comes with rsyslog:

head -1 /opt/logs/apache.log | /usr/lib/lognorm/lognormalizer -r


/opt/rsyslog/apache.rb -e json

○ You can also make liblognorm parse JSON logs. Next rule will put the parsed JSON under the
data​ variable:

rule=:%data:json%

○ alternatively, you can also use Logstash’s grok patterns with rsyslog, via the ​mmgrok ​module.
At the moment, this module has to be compiled manually, and it’s in an early stage (consider
using liblognorm instead as it’s more stable). You’d download the rsyslog tarball, fetch the
dependencies for both rsyslog and mmgrok, then run ​make​:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

7
apt-get install libgrok-dev grok glib2 libglib2.0-dev autoconf automake
libtool libtool-dev libjson-c-dev libestr-dev uuid-dev libgcrypt20-dev
liblogging-stdlog-dev libpcre3-dev libtokyocabinet-dev libevent1-dev
./configure --enable-mmgrok
make
make install

○ in rsyslog.conf, you’ll load ​mmgrok ​and run it to parse logs:

module(load="mmgrok")
action(type="mmgrok" # clone the patterns library from
https://github.com/logstash-plugins/logstash-patterns-core
patterndir="/var/lib/logstash-patterns-core/patterns/grok-patterns"
match="%{COMMONAPACHELOG}"
soure="msg"
target="!data"
)

○ note that currently the ​patternDir​ directive actually expects a file and not a directory
○ like ​mmnormalize​, mmgrok returns a JSON with the matched data, and you’ll specify in which
local variable to put it. Here, we called it ​data​, like we configured mmnormalize earlier on to
parse JSON
○ you’ll need this variable (in these examples, ​data​) later on, when you build your documents for
sending to Elasticsearch

Output modules: ​writing data to Elasticsearch


○ Like Logstash and Filebeat, rsyslog has multiple output modules. Most frequently, you’ll write
everything into /var/log/messages. The ​omfile ​module is built in, so we can go ahead and use it:

​action(type=​"omfile" file="/var/log/messages")

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

8
○ for any output action, we need to use a template. A template will choose which properties from
the log file will be written and in what form. For example, in the action above, we don’t specify a
template, so the default ​RSYSLOG_FileFormat​ template is used. This will write lines like this:

○ the ​RSYSLOG_FileFormat ​template writes an ISO8601 date, the host name, the syslog tag
and the syslog message. It doesn’t write down the severity - if we wanted that, we needed to
create our own template (​http://www.rsyslog.com/doc/v8-stable/configuration/templates.html​)
and specify the list of properties we want to include
(​http://www.rsyslog.com/doc/v8-stable/configuration/properties.html​). For example, if we wanted
to include the severity, we could have chosen the numeric form (​severity​) or the textual form
(​severity-text​)
○ if we want to forward to Elasticsearch, we need to first load the Elasticsearch output module:

​module(load="omelasticsearch")

○ Then, we’d want to define a template that will specify how the documents will look like (which
properties to include). Assuming we’ve ingested standard syslog, rsyslog will parse those by

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

9
default, and we can choose from properties such as message, facility and severity, to build a
JSON out of them:

​template(name="plain-syslog"
type="list") {
constant(value="{")
constant(value="\"timestamp\":\"") property(name="timereported"
dateFormat="rfc3339")
constant(value="\",\"host\":\"") property(name="hostname")
constant(value="\",\"severity\":\"")
property(name="syslogseverity-text")
constant(value="\",\"facility\":\"")
property(name="syslogfacility-text")
constant(value="\",\"tag\":\"") property(name="syslogtag"
format="json")
constant(value="\",\"message\":\"") property(name="msg"
format="json")
constant(value="\"}")
}

○ if you use mmnormalize to parse data, whatever is parsed already comes in JSON format, so
you only need to choose the variables you want from there. For example, if you parsed JSON
logs into the ​data​ variable (rule=:%​data​:json%), then you template can refer to that variable
alone and you’d get the original JSON:

​template(name="parsed-json" type="list") {
property(name="$!​data​")
}

○ if, on the other hand, you’ve used the apache logs rule shown earlier, you’ll have multiple
custom properties like ​$!clientip​ and ​$!bytes​. You can still refer to them individually, but you
can refer to the whole JSON containing them (mmnormalize returns a JSON) via the ​$!all-json
property:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

10
​template(name="all-json" type="list"){
property(name="$!​all-json​")
}

○ once you have your template(s) defined, you can move on to the ​omelasticsearch​ action.
There, you’ll use one of the defined templates:

​action(type="omelasticsearch"
template="all-json" # template for parsed apache logs
searchIndex="apache"
searchType="logs"
server="localhost"
serverport="9200"
bulkmode="on" # use the bulk API
action.resumeretrycount="-1" # retry indefinitely if Elasticsearch is
unreachable
)

○ note how the index name is static by default (which works well for size-based indices), but you
can make it dynamic by setting ​dynSearchIndex​ to ​on​. This allows you to use a template there
and use date-based indices. The same applies to type names or IDs. So first, you need to
define a template with the time-based indices pattern, like the one below which uses
logstash-YYYY.MM.dd​:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

11
​template(name="logstash-index"
type="list") {
constant(value="logstash-")
property(name="timereported" dateFormat="rfc3339" position.from="1"
position.to="4")
constant(value=".")
property(name="timereported" dateFormat="rfc3339" position.from="6"
position.to="7")
constant(value=".")
property(name="timereported" dateFormat="rfc3339" position.from="9"
position.to="10")
}

○ and then you’d point ​omelasticsearch​ to use this template as the index name

​action(type="omelasticsearch"
template="all-json" # template for parsed apache logs
dynSearchIndex="on"
searchIndex="logstash-index"
searchType="logs"
server="localhost"
serverport="9200"
bulkmode="on" # use the bulk API
action.resumeretrycount="-1" # retry indefinitely if Elasticsearch is
unreachable
)

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

12
Tuning queues, workers and batch sizes

○ by default, all messages from inputs are being stored in the main message queue. The main
queue, also by default, can store up to 10K messages in memory. There’s one thread on that
message queue that does all the conditionals, like pushing parsed logs with the JSON template
and plain syslog with the other template:

if $parsesuccess == "OK" then {


action(type="omelasticsearch"
template="all-json"
...
)
} else {
action(type="omelasticsearch"
template="plain-syslog"
...
)
}

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

13
○ that one thread also performs the actions (parsing and sending to Elasticsearch in this case),
and it does so in batches (like Logstash does)
○ you can change the size of the main message queue, the number of worker threads and the
batch size from the ​main_queue​ configuration object:

main_queue(
queue.workerThreads="4"
queue.dequeueBatchSize="1000"
queue.size="100000"
)

○ because the main queue worker threads also do the actions, the ​dequeueBatchSize​ value will
also be the maximum number of messages sent to Elasticsearch in a single bulk
○ besides changing the ​size​ of the queue, you can also change the ​type​. By default it’s in
memory (you can switch from the default ​FixedArray​ implementation to ​LinkedList​ if you prefer
flexible memory usage over performance) but it can be on disk (slower) or a combination of the
two
○ the last option is called ​disk-assisted​ and it writes on disk only when it runs out of memory.
You will have to choose the limits of both memory and disk, as well as the location for storing
the queue:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

14
The upper limit of the memory queue is the ​highWatermark​, while the upper limit of the disk queue is
maxDiskSpace​. Though when ​lowWatermark​ is reached, rsyslog starts writing to memory again. It’s all
configured in the ​queue​ element.

main_queue(
queue.workerThreads="4"
queue.dequeueBatchSize="1000"
queue.​highWatermark​="500000" # max no. of events to hold in memory
queue.​lowWatermark​="200000" # use memory queue again, when it's back
to this level
queue.​spoolDirectory​="/var/run/rsyslog/queues" # where to write on disk
queue.​fileName​="stats_ruleset"
queue.​maxDiskSpace​="5g" # it will stop at this much disk space
queue.size="5000000" # or this many messages
queue.saveOnShutdown="on" # save memory queue contents to disk when
rsyslog is exiting
)

○ each action can have its own queue. This is useful if you have multiple outputs (e.g. file and
Elasticsearch):
■ with only a main message queue, if Elasticsearch is unavailable and rsyslog keeps
retrying, the main queue will accumulate messages and rsyslog won’t dequeue them to
the file output, either
■ if each of the outputs has its own queue, you’ll be able to continue writing to files while
omelasticsearch’s action queue keeps filling up
■ when an action queue gets full it will eventually back on the main queue.
○ the side-benefit of such a setup is that you can use different [number of] threads for parsing the
data and for each type of action
○ for example, the file output should have one thread, while the Elasticsearch output typically
needs more:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

15
○ And here is the relevant part of rsyslog.conf:

main_queue(
queue.workerThreads="3"
queue.dequeueBatchSize="1000"
queue.size="500000"
)

action(type="mmnormalize" rulebase="/path/to/rulebase" )

action(type="omelasticsearch"
template="all-json"
searchIndex="logstash-write"
server="localhost"
serverPort="9200"
bulkMode="on"
action.resumeRetryCount="-1"

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

16
queue.type="LinkedList" # dynamic allocation of memory
queue.size="1000000" # max. number of events
queue.dequeuebatchsize="5000" # max. bulk size
queue.workerthreads="4" # threads to push to Elasticsearch
)

○ if the logs you write to a file come from different sources than those to be written to
Elasticsearch, you’ll be better of using rulesets than action queues

Using rulesets to manage multiple data flows

○ everything we mentioned so far: inputs, main and action queues, worker threads and batch
sizes are part of the default ruleset. But you can have more than one such ruleset:

○ a ruleset is an independent flow of data. Inputs, such as an imfile input, can be bound to a
custom ruleset

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

17
input(type="imfile"
File="/opt/logs/apache.log"
Tag="apache:"
​ruleset="apache"
)

○ in this case, messages processed from this file are enqueued in the ruleset’s queue instead of
the main queue
○ the logic is the same as we explained earlier, but the custom ruleset will also have its queue
with worker threads and actions. Actions which, again, can have their own queues and workers

​ruleset(name="apache")​{
queue.type="FixedArray"
queue.size="1000000"
queue.dequeuebatchsize="5000"
queue.workerthreads="4"

action(type="mmnormalize"
rulebase="/opt/rsyslog/apache.rb"
)

action(type="omelasticsearch"
template="all-json"
searchIndex="logstash-write"
server="localhost"
serverPort="9200"
bulkMode="on"
action.resumeRetryCount="-1"
)
}

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

18
○ using multiple rulesets is a more efficient (and easier to follow) alternative to using conditionals
when parsing different kinds of data. For example, local syslog may follow a completely different
flow than application logs tailed from files
○ multiple rulesets also handle output failures better:
■ in the example above, if Elasticsearch is unavailable, the action queue will start filling up
(if it exists), then the ruleset queue
■ when the ruleset queue is full, the file input will stop reading files and will resume only
when omelasticsearch will resume consuming messages
■ this has no influence on the local logs, they continue to get written to /var/log/messages
(or forwarded, depending on the action for the local syslog ruleset)
■ as a result, a failure in Elasticsearch won’t affect the whole flow anymore, just that
ruleset
○ you can also ​call​ one ruleset from another. For example, maybe you want to output local syslog
errors to Elasticsearch (through the same flow Apache logs go), after writing them to a file:

if $syslogseverity-text == "error" then {


action(type="omfile" file="/var/log/errors");
call apache;
}

○ in this case local syslog can become influenced by Elasticsearch not being available, because
now we go over the ruleset boundary:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

19
RainerScript: variables, conditionals, loops and lookup tables

○ we’ve shown conditionals before, they typically work on variables returned by the syslog parser.
For example, you can filter messages based on severity

if ​$syslogseverity​ < 4 then { ​# more severe than WARN


action(type="omfile" file="/var/log/oopsies")
}

○ there are other categories of variables, though. There are variables returned by JSON parsers
(notably, mmnormalize), referred to as ​$!variable-name​. This allows you to change variable
names like here ​timestamp ​to ​@timestamp​:

action( type="mmjsonparse" ruleBase="/etc/myrulz.rulebase" )


if ​$!timestamp​ != "" then {
set $!@timestamp = $!timestamp;
unset $!timestamp;
}

○ there are also local variables, referred to as ​$.variable-name​. Here, we extract a hostname
from a fully qualified domain name:

​set ​$.hostname​ = field($!fqdn, ".", 1)

○ you can also use ​foreach l​ oops to iterate over arrays and objects. For example, from an array
of host names, to write each one as a single line to a file:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

20
​template(name="host" type="list"){
property(name="$.host")
constant(value="\n")
}
foreach ($.host in $!hosts) do {
action(type="omfile" file="/tmp/hosts"
template="host")
}

○ there’s more to RainerScript, you can find the updated documentation here:
http://www.rsyslog.com/doc/master/rainerscript/index.html​ but we’d like to mention one last
feature: ​lookup tables​. These are useful if you want to map parts of your message (e.g. an IP
field) with a custom tag (e.g. a department name). If you record this IP-to-department mapping
in a JSON file, like:

{ "version" : 1,
"nomatch" : "unknown",
"type" : "string",
"table" : [
{"index" : "1.2.3.4", "value" : "accounting" },
{"index" : "1.2.3.5", "value" : "accounting" },
{"index" : "5.6.7.8", "value" : "IT" }
]
}

○ rsyslog can load it and use this information. For example, by writing each department’s data in
its own index:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

21
​# assuming mmnormalize parsed apache logs, and stores IP in $!client-ip
# assuming the “all-json” template prints $!all-json
lookup_table(name="ip_to_dept" file="/opt/departments.json")
# new variable here, $!department, will go to $!all-json
set ​$!department​=lookup("ip_to_dept", ​$!client-ip​);
action(type="omfile" file="/var/log/department" template="all-json")

Pipeline patterns when sending data to Elasticsearch


○ when centralizing data to Elasticsearch, you can take advantage of the fact that rsyslog is light
and use it on all servers that produce logs:

○ you can also consolidate messages to one or multiple dedicated central rsyslog boxes. This will
allow you to have a central queue and in most cases a single place to troubleshoot any queue
and output issues. Also, bulks sent to Elasticsearch will be bigger, which might help reduce the
load on it:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

22
○ to get to a centralized setup, on the logging servers you’d use the ​omfwd ​module to forward via
TCP, for example:

action(type="omfwd"
target="server1"
port="514"
protocol="tcp"
)

○ you can also use other protocols, like UDP or TLS


○ there’s also RELP, which provides application-level acknowledgements for at-least-once
guarantees (optionally also with TLS), though to use it you’ll load ​omrelp
○ Then on the server side, you’d listen to TCP traffic via the ​imtcp ​module (alternatively, you can
try im​p​tcp, which is faster and also threaded, but doesn’t support TLS for now):

module(load="imtcp")
​input(type="imtcp" port="514")

○ sometimes you need to couple rsyslog with some other tool, like Logstash. One way to do it is
directly, by sending JSON over TCP/UDP or some other common protocol:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

23
○ to send JSON over TCP from rsyslog, you’d use a template that sends JSON:

action(type="omfwd"
target="logstash01"
port="10514"
protocol="tcp"
template="all-json"
)

○ on the Logstash side, you’ll have to use the JSON codec along with the TCP input:

tcp {
port => 10514
​codec => "json"
}

○ another option is to use Kafka (or Redis) as a central buffer, which adds one more moving
piece, but may expose needed functionality (like replaying messages from Kafka):

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

24
○ on the rsyslog side, you’ll need to use omkafka to output messages to Kafka. Preferably, the
template used would be JSON:

module(load="omkafka")
action(type="omkafka"
broker=["localhost:9092"]
topic="logstash"
template="all-json"
action.resumeRetryCount="-1"
)

○ on the Logstash side, you’d connect to Kafka like you do when Filebeat or Logstash pushes
messages. If you send JSON from rsyslog, remember to add the ​json​ codec:

input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["logstash"]
codec => "json"
}
}

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

25
Example configurations
Sending local syslog to Elasticsearch
This config should do (though the main queue is not optimized for performance):

module(load="imuxsock")
module(load="imklog")
module(load="omelasticsearch")

template(name="plain-syslog"
type="list") {
constant(value="{")
constant(value="\"timestamp\":\"")
property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"host\":\"")
property(name="hostname")
constant(value="\",\"severity\":\"")
property(name="syslogseverity-text")
constant(value="\",\"facility\":\"")
property(name="syslogfacility-text")
constant(value="\",\"tag\":\"")
property(name="syslogtag" format="json")
constant(value="\",\"message\":\"")
property(name="msg" format="json")
constant(value="\"}")
}
action(type="omelasticsearch"
template="plain-syslog" # use the template defined earlier
searchIndex="syslog"
searchType="syslog"
server="localhost"
serverport="9200"
bulkmode="on" # use the bulk API
action.resumeretrycount="-1" # retry indefinitely if
Logsene/Elasticsearch is unreachable
)

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

26
Tailing files with rsyslog and sending them to Elasticsearch
You can add this to the previous conf:

module(load="imfile")
input(type="imfile"
File="/opt/logs/apache.log"
Tag="apache:"
)

Using rulesets to separate local and remote logs


We’ll need define the ruleset with its own queue and omelasticsearch action (make sure the template -
plain-json​ here - is defined earlier in rsyslog.conf):

ruleset(name="apache"

queue.type="FixedArray"
queue.size="10000"
queue.dequeuebatchsize="1000"
queue.workerthreads="4"
){

action(type="omelasticsearch"
template="plain-syslog"
searchIndex="apache"
searchType="apache"
server="localhost"
serverport="9200"
bulkmode="on"
action.resumeretrycount="-1"
)
}

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

27
And we’ll bind the imfile input to the specified ruleset:

input(type="imfile"
File="/opt/logs/apache.log"
Tag="apache:"
ruleset="apache"
)

Parsing Apache logs with mmnormalize and sending them to Elasticsearch


You’ll need a rulebase file like:

# cat /opt/apache.rulebase
rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%]
"%verb:word% %request:word% HTTP/%httpversion:float%" %response:number%
%bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

Then, in rsyslog.conf, you’ll load mmnormalize:

module(load="mmnormalize")

Define a template with the ​all-json​ variable that will contain the parsed result:

template(name="all-json" type="list"){
property(name="$!all-json")
}

Before the ​omelasticsearch ​action, you’ll do the parsing action:

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

28
action(type="mmnormalize"
rulebase="/opt/apache.rulebase"
)

And finally, in the omelasticsearch action, use the template with ​all-json​:

action(type="omelasticsearch"
template="all-json"
searchIndex="apache"
...
)

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

29
About Sematext

Sematext runs Sematext Cloud - infrastructure and application performance monitoring and log management solution,
giving your business full-stack visibility by exposing logs, metrics and traces through a single Cloud or On Premise
solution. Sematext also provides Consulting, Training, and Production Support for Elasticsearch, the ELK/Elastic Stack,
and Apache Solr.

We are known for our Logging Consulting and other related services. If you need help with Rsyslog integration or looking
to replace Splunk with Elasticsearch, Logstash, and Kibana ( ELK / Elastic Stack) or an alternative logging stack contact
us.

Our Products
Sematext Cloud​ • ​SPM ​ • ​Logsene​ • ​Docker Agent​ • ​Kubernetes Agent

Our Services
Consulting​ • ​Training​ • ​Support

 
©​ ​Sematext Group, Inc​. ​Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries​.

30

You might also like