You are on page 1of 5

Splunk Fundamentals

Machine Data
Not structured
Makes up 90 % of data collected by organisations
What is Splunk
Splunk Components
Index Data
Collects data from any source
Label the data with a source type
Timstamps identified and collected
Added to splunk so they can be searched
Search & Investigate
Search via SQL/queries
Add Knowledge Objects
Give data classifications
Normalise it
Enrich the data
Monitor and Alert
Monitor and alert, respond with actions
Dashboards can be used to visualize
Processing components
Indexer
Process incoming data store data in indexes
Stores data in directories with time frame components
Search Head
Allows users to write SQL, distribute searches to various indexes, then returns
data back
Tools such as dashboards and reports
Search Requests are processed by the Indexers…
Forwarder
Ingest data
Usually on machine where data originates.. – not possible in CGI..
Light weight, don’t require a lot of performance.

Scaling
Single Instance contains all components on one node
Input,parsing,indexing and searching of data ( 4 main processes )
Ok for PoC, personal use, learning , and small departments/organizations
Distributed System
You would distribute the Input,Parsing , Indexing and Searching across multiple
nodes i.e. More than 1 Search Head to allow concurrency
More forwarders to ingest more data more quickly
All can be clustered to ensure all are available – no single points of failure

Installing Splunk Enterprise


I installed on my AWS Linux machine
Essentially was a linux unzip, then /opt/splunk/bin/splunk start
Apps and Roles
Preconfigured environments that sit on top of Splunk Enterprise
Think of them as workspaces
Roles – what a user can do
Admin – install/ingest/create knowledge app
Power – create shared knowledge and run searches
User – use apps shared and
You can create and deploy your own Apps..
Apps that ship are “Home App” and “Search & reporting”
You can launch and manage apps from the home app
Get Data In
Add Data –
Upload – upload csv type data, one off, good for testing
Source Types
Good for classifying data, if it recognizes data, it labels it dynamically
Source types : we can add different custom types, it can decide how to determine
fields/delimiters for data from the source type i.e. csv…
Source types : can be amended.. can define how to split files
App Context : source types can be made available system wide…
Hostname : can be added from the content of the file, or from the host it originats
from
Indexes
Should split data into multiple indexes – bit like any Database
You can control who has access to indexes(data) by Role management
You can also set retention periods for each index, i.e. dropping partitions
essentially so makes management faster and easier.

Monitor – monitor ports, locations etc

Files & Directories


You can continuously monitor
You can white list/blaclist files
Can dynamically pick source type

App Context can be selected


Click commit, starts the indexer allowing
HTTP Event Collector
TCP/UDP
Scripts

Forward – installed on remote machines, these are forwarded to indexers..


Setting up forwarders.. outside scope
Windows – would allow to monitor local and remote event logs etc..

Basic Searching
Search > conduct searches > enter query
Datasets > see what data sets are available
What to Search Panel > data summary > summary of data available
Contains host name
Search history menu
Search i.e. find failed authentication i.e. failed
Failed : ( last 30 days > make sure you set a time range )
Shows events
Can save results as knowledge objects
Patterns
Visualizations can be > transforming commands are used to create tables from which
visualizations can be used
Stop a job..
We can share a job
Jobs remain active for 10 minutes after completion
A shared search job, remains active for 7 days… ( snapshotted )
Export results in JSON,RAW,CSV etc..
Modes :
Fast – highlevel, fields discovery disabled
Verbose – discovers all fields
Smart – toggles behavior..based on the search being run
Timeline, shows you events during time range..
Zoom IN > uses original job output , zooming OUT it will run a new job/report
EVENTS
Use returned events… to dig further
Timestamp > is retrieved as per your user account timezone
Fields can be added removed from the search.. you can click on the field in the
event, allows you to edit the search criteria

SEARCH EVERYTHING
i.e. fail * > use wildcards

failed NOT password ( Booleans )


failed OR password
failed AND password
Order of evaluation
NOT
OR
AND
Escape character i.e. info=”user \”chrisv4\” not in database”
Using Fields
Fields > extracted fields from search > host,source,type are selected by default
FieldNames are Case Sensitive i.e. SourceType=
FieldValues are NOT case sensitive
Wildcards can be used with search fields
# = numeral
a = string i.e.
a dest 4 // it’s a string, contains 4 values
You can add fields to the query, will also add transforming events, so creates
visualizatins.
You can filter fields, see statistics on fields.
Sourcetype=linux_secure >> bold case sensitive
=! String values
“<>” numerical values
i.e. NOT host=mail* >> wild cards.,
Lab 6 : No Events where didn’t end in HTTP 200 (i.e. Success ) = 1301,
19,235 in total

Best Practices
Last 7 days – limit time frame
The more you tell the search the better
Inclusion is better and NOT listing inclusion
Always use Sourcetype= as a first step
Using Time
Date & time ranges
Real Time searches i.e. between specific times i.e. 10 minutes ago until SYSDATE,
will perform search in rolling 10 mins.
Advanced tab – i.e. -30m – last 30 minutes…
-30d ( d = days ) ( w = weeks ) ( mon = month ) ( y = year )
@ ( round ) to a unit i.e. -30m@h events from start of the hour are returned.
Can be used in searches i.e.
Sourcetype=access_combined earliest=-2h latest=-1 or absolute values
Sourcetype=access_combined earliest=01/08/2018:12:00:00
Use Indexes
This will limit the search run time i.e. web data, security data.. only search
partitions/indexes
Can also limit access to data
To search a specific index : index=web OR index=security
Index=ma* ( also can use wildcards )

SPL Fundamentals
Splunk Search language
Sourcetype=acc* status=200 | stats list(product_name) as “Games Sold”
Search terms
Sourcetype=acc* status=200
Commands — chart/visualization
Stats()
Functions — how to chart
List()
Arguments
(product_name)
Clauses
as
Can also pass result to another via Pipe |
index=web (sourcetype=acc* OR sourcetype=ven*)
| timechart span=1h sum(price) by sourcetype
Search Limitations
Search results are piped into other commands
They are in memory, left to right
Fields Command
Index=web source_type=access_combined | fields status clientip // include status
and clientip
Index=web source_type=access_combined | fields -status clientip // exclude status
and clientip
Index=web source_type=access_combined | fields -_raw // exclude raw with hidden _
Field Extraction really slows down !! so include fields you want will increase
performance, excluding will only occur after query has run, so will not improve
performance, but will change the visualization
Table Command
Table= returns data in tabulated format, so we can easily see what products were
purchased i.e.
Index=web sourcetype=access* status=200 product_name=* | table
jsessionid,product_name,price
Rename Command
Rename fields in a table
Index=web sourcetype=access* status=200 product_name=*
| table jsessionid,product_name,price
| rename jsession as “User Session” product_name as “Purchased Game” price as
“Purchase Price”
Changing names be careful, as the pipeline will depend on the original table.
Make sure you quote the new name i.e .”User Session”

Dedup
We can use dedup to remove dups
Index=security sourcetype=history* address_description=”San Francisco”
| dedup firstname lastname
| table username firstname lastname
Sort
We can sort desc/asc
Sourc|etype=vendor_sales
| table vendor product_name sales_price
| sort vendor product_name
OR
| sort – sale_price Vendor ( sort by sale price + or – )
Where you place the + or – is key, – sale price vendor will sort all rows by sale
price then vendor. Whereas -sale price vendor will first sort sale price, then
vendor.. so having a space makes a big difference to the output !!
Limit-20 > you can limit to first 20 rows like most SQL limit commands …

Transforming Commands
Order search results into a data table
Transform into viisualzations
TOP by default is 10 !! top 10
TOP – top n commands i.e.
Index=sales source_type=vendor_sales | top Vendor
Can add limit = 0 for all rows
Can add limit = 20
Clauses can be used
Index=sales sourcetype=vendor_sales
| top vendor limit=5 showperc=False
Countfield =”Number Of Sales” useother=True
Top 3 products sold by each vendor – GROUP by equivalent…
Index=sales sourcetype=vendor_sales
| top vendor limit=3 by Vendor showperc=False
Countfield =”Number Of Sales” useother=True
RARE
Bottom 3 products sold by each vendor – GROUP by equivalent…
Index=sales sourcetype=vendor_sales
| rare vendor limit=3 by Vendor showperc=False
Countfield =”Number Of Sales” useother=True
STATS
To produce stats i.e.
Count
Total number of sales in last week
Index=sales sourcetype=vendor_sales
| stats count as “Total Sells By endors
By product_name,categoryid,sale_price
Count of a number of fields were present i.e. an action vs Total events
Index=sales sourcetype=vendor_sales
| stats count(action) as “Action Events”,
Count as “Total Events)
Dc ( dictinct count )
Distinct counts
Index=sales source_type=vendor_sales
| stats distinct_count(product_name) ( or dc)
As “number of games by vendor” by sales_price
Sum()
Sum Function be sure to include all aggregations within the same pipe function.
Index=sales source_type=vendor_sales
| stats sum(price) as “Gross Sales” by product_name
stats count as “Units Sold”
Avg()
Average sales
Index=sales source_type=vendor_sales
| stats avg(sale_price) As “average sales price”
By product_name
List() list all values of a given field
Lists all values for a field i.e .all assets an employee has
Index=ncgassets source_type=asset_list
| stats list(Asset) as “company assets” by Employee

Values() list distinct values of a given field , i.e. list of sites users have
visited.
Lists all UNIQUE values for a field i.e .all assets an employee has
Index=network source_type=cisco_wsa_quid
| stats values(s_hostname) by cs_usename

You might also like