Professional Documents
Culture Documents
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 1 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Document Usage Guidelines
• Should be used only for enrolled students
• Not meant to be a self-paced document, an instructor is required
• Please do not distribute
13 May 2016
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 2 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Course Prerequisites
• Required
– Using Splunk
– Creating Splunk Knowledge Objects
– Splunk Administration
• Strongly Recommended
– Searching and Reporting with Splunk
– Advanced Dashboards and Visualizations
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 3 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Course Goals
• Understand best practices for architecting and documenting Splunk
Enterprise distributed deployments
• Provide a framework and methodology for Splunk deployments
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 4 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Course Outline
1. Introduction
2. Initial Requirements Definition
3. Apps and Index Design
4. Infrastructure Planning
5. Data Collection
6. Forwarders and Deployment Management
7. Data Comprehension
8. Search Considerations
9. Integration
10. Operations and Management
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 5 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 1:
Introduction
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 6 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Introduce the Splunk deployment planning process and tools
• Review the structure of this course
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 7 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment Scaling
A deployment plan creates a solid foundation
• to scale Splunk deployments as they evolve
Expansion
• to implement large, enterprise deployments
Enterprise
Workgroup Deployment
Download
• Enterprise standard
• Large number of users
• Many different use cases
• More sites
• Many different users
• More geographies
• Specific use case • More data sources
• Initial User • Specific users • More data volume
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 8 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Deployment Methodology
• Provides a structured path to implement Splunk in an enterprise
• Identifies the people and resources required, along with responsibilities
• Provides a task checklist and ordering
• It consists of six phases: Infrastructure
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 10 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 1 Lab Exercise
Time: 10 minutes
• Task: Read the lab document for Module 1 (pages 1-4)
– DeployingSplunk64_Labs.pdf
– If you have difficulty seeing the graphics in the lab document, they also appear on
the following three pages
• Task: Download planning tools (optional)
– Data Source inventory (data_inventory.rtf)
– Index Planning and Sizing (data_sizing.xlsx)
– Splunk Icon Library (splunk_icons.pptx)
– Use Case checklist (use_case.pdf)
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 11 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Current IT Environment
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 12 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Current Logging Environment - Corporate
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 13 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Current Logging Environment - CoLo
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 14 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 2:
Initial Requirements
Definition
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 15 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Identify the information that is needed for deployment decisions
• Provide lists and resources to aid in collecting requirements
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 16 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Key Planning Information
• Identifying requirements is a first step in the deployment
– More information will be uncovered throughout the phases of deployment
• Gather raw material for the deployment plan at the beginning, if possible
– Overall goals for the deployment
– Key users, including their goals and use cases
– Current environment
ê physical environment
ê monitoring tools in use
– Data sources
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 17 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Overall Goals
• Who is the sponsor?
• In addition to the sponsor, who else must be satisfied?
• What are the goals for the deployment?
• How will success be measured?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 18 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Use Cases
• What will the users do with Splunk?
– What tasks will they perform?
– What reports will they generate?
• Authentication
– What system is in current use?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 20 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Current Logging Environment
• Are any logs or other data sources collected or centralized today?
– Logged to SAN / NAS devices
– Centralized via syslog, syslog-ng, rsyslog, Kiwi, Snare, etc.
– Parsed and stored in a SQL database
• Is there an expectation that Splunk will integrate with or replace existing tools?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 21 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
General Requirements
• Security
– What security policies may affect the collection, retention, and reporting of data?
– What approvals will be needed?
• Regulatory
– Are there laws, regulations, or policies that affect the data collection, reporting,
etc.?
• High Availability or Disaster Recovery
– Is data replication required?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 22 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Sources
• Data Source Inventory
– What is the superset of data sources needed by all users?
– How much data is generated per day?
• Data Policy
– How long should each data source be retained?
– Who can see particular data elements?
– What data needs protection against tampering?
– What proof of integrity needs to be provided?
– Will Splunk be the primary repository for data?
• Handout
– data_inventory.rtf
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 23 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Example Data Source Inventory
Data Source: Firewall logs
Location and Type: UDP:514
file input (name and path), API,
network port, device, etc.
Ownership and Access: Network team owns; Security can also access
Data Volume
Average per day: 500 MB
Peak: 6000 EPS at ~100 bytes/event
Retention: 6 months
Format or sourcetype: syslog?
Collection method: How will this data get to the indexers? Is there
a collection point at each location?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 24 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 2 Lab Exercise
Time: 15 minutes
• Task: Complete the abbreviated data source inventory
– You may not know enough to supply answers for all data sources
– Part of the inventory has been completed for you
ê Do you agree with the completed portion?
ê Do you have questions for the sponsor?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 25 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 3:
Apps and Index Design
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 26 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Understand index implementation
• Design indexes Infrastructure
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 27 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Indexing Review
• Optimized for quickly indexing and persisting unstructured data
• Minimal schema for persisted data
– An event consists of raw text, timestamp, source, source type and host
– This is called "rawdata" and is stored in a compressed file
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 28 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Indexing Review (cont.)
• Additional processing on raw events is deferred until search time
• This serves 4 important goals
- indexing speed is increased minimal processing is
performed during indexing
- bringing new data into the no schema planning is needed
system requires less effort
- the original data is persisted
for easy inspection
- the system is resilient to data parsing problems do not
change require reloading or re-indexing
of the data
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 29 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Disk Storage Details – Files
• As Splunk indexes your data, it creates two main types of files:
– rawdata contains the original data in a compressed form
ê for "syslog-like" data, this may be 10% to 15% of the raw pre-indexed data
ê size is dependent on how well data compresses
– Index files (.tsidx) which point to the unique terms
ê these files range in size from approximately 10% to 110% of the rawdata size
ê size is strongly affected by the number of unique terms in the data
ê adding indexed field extractions will increase the size of the .tsidx files
• Buckets contain both rawdata and index files
– Exception: replicated buckets may have only rawdata
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 30 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Types of Index Files
Inverted index (*.tsidx)
Metadata (*.data)
– Information about sources, sourcetypes and hosts of the events
contained in each bucket
Bloom filters (bloomfilter)
– Efficient data structure that authoritatively rules out buckets
ê Provides 100% certainty that a search term is not present in a bucket
– Consulted by every search
– Very fast (1-2 IO)
https://en.wikipedia.org/wiki/Bloom_filter
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 3131 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Inverted Index
• Index data structure that maps from content
(such as words or numbers) to its locations red sweater
en.wikipedia.org/wiki/Inverted_index
blue
• An inverted index allows fast full text searches pants
red blue sweater
• Splunk .tsidx uses an inverted index sweater
structure to map keywords to their locations in
the rawdata red pants
Source: Wikipedia
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 32 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Index Files
IDX 3
IDX 2
IDX 1
Source / Sourcetype / Host Metadata
1 source : : /my/log 100 et lt it
2 source: : /blah 150 et lt it
Home Path
hot_v1_100
TSIDX
*.data
*.tsidx apple beer coke LEXICON
rawdata cream ice java …
Rawdata
Cold Path
db_lt_et_80 "apple pie and ice cream
are delicious"
"an apple a day keeps
Thawed Path doctor away"
db_lt_et_70
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 33 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Managing disk usage with tsidx reduction
• Full index files are normally kept with the rawdata until the bucket is frozen
• To save disk space, you can enable tsidx reduction in indexes.conf
– Based on an age setting
ê timePeriodInSecBeforeTsidxReduction = 7776000 (90 days)
– Replaces some index files with mini-index files
– Removes merged_lexicon.lex
– Overall, the bucket size is typically reduced by 1/3 to 2/3
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 35 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Testing Indexing Compression
• Confirm estimates with actual data
– Create a baseline with real or simulated data
– Find compression rates
– Leverage _internal index metrics to determine actual input volumes
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 36 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Sizing Indexes Example
Assuming
• No replication
• Raw data compression is 15% of daily volume
• Index file compression is 35% of daily volume
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 37 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Apps
• Which Splunk apps are appropriate?
– Apps are often chosen based on
ê devices or technologies in the production environment
ê use cases
ê inputs
• Apps often have particular infrastructure
requirements
• Most apps are free http://www.splunkbase.com
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 39 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Key Criteria for Index Design
• When to partition data into different indexes
– Retention
ê How long should the data be kept? Online? Offline?
ê All retention settings apply on a per-index basis
ê All data sources within an index should have the same retention
– Access
ê Who can search against the data?
ê All data sources within an index will have the same visibility
– access to indexes is controlled by role
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
40 Listen
Listentotoyour
yourdata.
data. 40 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Additional Design Criteria
• Search Performance
– Less common to differentiate here, but may be important
– Mixed data = mixed lexicon, larger number of keywords, therefore larger index
– Things searched together can be indexed together
– Example:
ê Both a high-volume / high-noise data source and a low-volume data source feed into
the same index
ê If you search mostly for events from the low-volume data source, the search speed
will be slower than necessary
ê To improve performance, create a separate index for the high-volume data source
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 41 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 3 Lab Exercise
Time: 15 minutes
• You may want to use the spreadsheet data_sizing.xlsx
• Task: Estimate the disk space required
• Task: Index Design
– Identify the indexes
– Estimate the Splunk license required for indexing
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 42 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 4:
Infrastructure Planning
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 43 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Define environment requirements
• Understand sizing factors for servers Infrastructure
clustering Opera'on
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 44 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Foundation for Splunk Deployment
• A low latency network
Standard
Distributed
Internet
Search Connection
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 45 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Basic Sizing Considerations
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 4747 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk App for Enterprise Security (ES)
• Infrastructure Impacts
– A dedicated search head or search head cluster
– One indexer per 100GB data indexed per day
ê Assumes 15 correlation searches running
ê Additional data storage for accelerated data models, estimated as
data indexed per day * 3.4
http://docs.splunk.com/Documentation/ES/latest/Install/Overview
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 48 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk App for VMware
• Infrastructure Impacts
– This app adds approximately 300 MB of data per ESXi host per day to Splunk
– Requires data collection nodes running as VMs in the VMware environment
ê the data collection node is distributed as an Open Virtual Appliance (OVA) file
ê each data collection node can collect data from 40 or fewer ESXi hosts, with a ratio
of 25 to 30 virtual machines per ESXi host
http://docs.splunk.com/Documentation/VMW/latest/Installation/Aboutthismanual
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 49 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk App for IT Service Intelligence (ITSI)
• Infrastructure Impacts
– A dedicated search head or search head cluster
– Additional infrastructure may be needed, depending on the number of Key
Performance Indicators (KPIs) that are tracked
ê The documentation has recommendations
http://docs.splunk.com/Documentation/ITSI/latest/Configure/Performancerecommendations
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 50 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk User Behavior Analytics (UBA)
• Hardware requirements
– 50GB disk space for Splunk UBA installation
– 500GB additional disk space for metadata
– 16 CPU cores
– 64 GB RAM
http://docs.splunk.com/Documentation/UBA/latest/Install/Requirements
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 51 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Components
What are the requirements for each type of Splunk instance?
Indexer Search Head Deployment License Heavy Cluster Search Head Deployer
Splunk Enterprise (Search peer) Server Master Forwarder Master Cluster
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 52 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Component Types – Indexer
• The Splunk component that indexes data, transforming raw data into
events and placing the results into an index
• It also searches the indexed data in response to search requests
Index
Search Head
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 53 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Indexer – Reference Server
Indexer
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 54 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Horizontal Scaling
• Adding indexers scales capacity Search
– Allows a greater daily indexing volume Head
– Speeds searches
• When using multiple indexers
– Use Splunk’s built-in forwarder load
balancing Indexers
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 55 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Increasing Indexer Utilization
• Normally, the indexer uses approximately 4-6 cores for indexing
– Possibly underutilizing the indexer hardware
splunkd
splunkd
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 56 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Disk Storage Details – About IOPS
• Input/Output Operations Per Second (IOPS) measures disk throughput
– Hard drives read and write at different speeds
ê average IOPS is the blend between read and write speed
– To get the most IOPS, choose drives with
ê high rotational speeds
ê low average latency and seek times
• Splunkbase has an app to analyze disk performance using Bonnie++
https://splunkbase.splunk.com/app/3002/
• Links to additional information on determining IOPS
www.symantec.com/connect/articles/getting-hang-iops-v13
www.cmdln.org/2010/04/22/analyzing-io-performance-in-linux
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 57 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Disk Storage Details
• Splunk indexing is read / write intensive
• Recommended RAID setup is RAID 10 (1 + 0)
– Provides fast reads and writes with the greatest redundancy
– Do NOT use RAID 5 since the duplicate writes make it too slow for high
performance indexing
Source: Wikipedia
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 58 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Disk Storage Details – SAN and NAS
• Storage Area Networks (SAN) and Network-
attached Storage (NAS)
– Not a good choice for hot and warm buckets (db)
defaultdb
– Suitable for cold buckets (colddb)
– searches over long time periods that utilize colddb
will be slower
• High performance SAN can be used in
environments with high speed networking and db
(hot and warm buckets)
colddb
(cold buckets)
low latency
• IOPS and reliability are the definitive factors
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 59 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Storage Requirements
• The disk storage needed on an indexer is primarily determined by the size
of all the indexes, plus base storage for the OS and configuration files
• Additional disk storage is needed for
– Indexer Clustering
ê data replication
ê discussed later in this module
– Summarization and Acceleration
ê data models
ê report acceleration
ê summary indexing
ê discussed in the Search Considerations module
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 60 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Virtualizing Splunk
• You can virtualize any Splunk instance if you meet the minimum resource
requirements
• Tips for virtualization
– Use locally attached volumes; direct access to a dedicated volume is best
– Separate search head VMs from indexer VMs, as they burst CPU together
– Ensure that sufficient resources are reserved – do not overcommit CPU or
memory for Splunk VMs
– Expect virtualization to reduce performance by 10% or more
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 61 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Component Types – Search Head
• The search head handles search
management functions
– Directs search requests to a set of search Search Head
Indexers
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 62 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Search Head – Reference Server
• 1 per 10 - 12 concurrent jobs / queries Search Head
• Types of searches
– Reporting, dashboard, ad-hoc
• Jobs
– Summarization, Alerting, Reporting
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 66 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Additional Components - Sizing
CPU Memory Disk Network Factors
License Master low low low 1 Gb number of slaves
Deployment med* med* low 1 Gb number of clients and
Server number of apps + config files
Best Practice
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 68 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Authentication / Authorization
• LDAP (Lightweight Directory Access Protocol) / AD (Active Directory) for
authentication and group management
• SSO (Single sign-on) allows a user to log in once and gain access to all
systems without being prompted to log in again
– Protocols: Kerberos, Security Assertion Markup Language (SAML)
– Requires a proxy with trusted connection
– Splunk Professional Services help may be required to install
Manage Indexes
Manage Users
Share Searches
Save Searches
ERP App Access
PCI Data Access
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 70 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Security, Privacy, Integrity Measures
• HTTPS transport is available end-to-end
– Create your own certificates or acquire from a valid 3rd party
– Distributed search (enabled by default between search head and indexer)
– Forwarder to indexer over TCP
– Web browser access to Splunk Web
docs.splunk.com/Documentation/Splunk/latest/Security/
AboutsecuringyourSplunkconfigurationwithSSL
• Indexer Acknowledgement
• Index Splunk’s configurations and logs to track changes
• Event auditing
docs.splunk.com/Documentation/Splunk/latest/Security/AuditSplunkactivity
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 71 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Security, Privacy, Integrity Measures (cont.)
• Disable Splunk Web when it is not required
– For example, indexers in a distributed deployment do not require Splunk Web
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 72 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Clustering
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 73 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Clustering Overview
Splunk offers two different, independent clustering capabilities
• Indexer clustering
– replicates data (buckets) based on configured policies
– provides increased data integrity and availability
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
74 Listen
Listentotoyour
yourdata.
data. 74 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
About Indexer Clusters
• Indexer clusters – A group of Splunk Enterprise nodes that, working in
concert, provide a redundant indexing and searching capability
– docs.splunk.com/Documentation/Splunk/latest/Indexer/Aboutclusters
• Multisite indexer cluster – An indexer cluster that spans multiple sites, such
as data centers
– Each site has its own set of peer nodes and search heads
– Each site also obeys site-specific replication and search factor rules
– docs.splunk.com/Documentation/Splunk/latest/Indexer/Multisiteclusters
– The cluster administrator defines the "sites"
ê can represent a physical location (city, data center, rack) or something else
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 75 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Single-site Indexer Cluster Overview
• Cluster Master Index Replication cluster
– Controls and manages index replication Search Head Cluster Master Distributed
– There can only be one master search
• Peer node
– Indexes data from inputs/forwarders Data replication
– Replicates data to other peer nodes as Peer Nodes
instructed by the master
• Search head
– Works the same as any Splunk search head
– Required component of indexer cluster
• Forwarders
Forwarders with autoLB & useACK enabled
– Send data to peer nodes
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 76 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Multisite Indexer Cluster
• Allows for an extra layer of data partitioning
– Indexers are grouped in "sites"
(defined by the Splunk Admin)
• Multisite clusters offer two key benefits:
Replicate
– Disaster recovery HQ (site1) NY (site2)
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 78 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Replication and Search Factors
• Peer nodes copy buckets to each other (replication)
– The copied buckets may be complete buckets or contain only rawdata
– The replication factors apply to the entire cluster
– Replication must be enabled for each index
ê the default is not to replicate indexes
• Replication factor (RF)
– Specifies how many total copies of rawdata the cluster should maintain
– This sets the total failure tolerance level
• Search factor (SF)
– Specifies how many copies will be searchable
ê searchable buckets have both rawdata and index files
– Determines how quickly you can recover the search capability
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 79 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Indexer Clustering Requirements
• A cluster master
– Plus a "stand-by" cluster master, powered off
• Indexers
– Single site mode
ê at least "replication factor" peer nodes in single-site mode
– that is, for a replication factor (RF) of 3, three peer nodes are required
ê best practice: at least RF + 1 nodes as a minimum
– Multisite mode
ê at least 2 peer nodes per site in multi-site mode
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 80 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Sizing Guidelines
• Peer Nodes
– Peer nodes should meet the minimum reference server requirements
– Replication and search factors affect the peer nodes
ê CPU and network load will be affected by larger factors
ê Disk sizing is directly affected by replication and search factors
• Cluster Master
– No published minimums
– Must have sufficient CPU and network resources to service requests
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 81 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Storage Requirements for Clustering
• With index clustering, you must consider the replication factor (RF) and
search factor (SF) to arrive at total storage requirements
– Total rawdata disk usage = rawdata total * RF
– Total index disk usage = index data total * SF
• NOTE: This does not include disk space needed to rebuild search factor if
required
– docs.splunk.com/Documentation/Splunk/latest/Indexer/
Systemrequirements#Storage_considerations
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 82 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Estimating Disk Usage
• For this example, assume
– rawdata on disk = ~ 15% of daily index data
– index files on disk = ~35% of daily index data
• Note that the per day storage must still be multiplied by the # days retention!
Daily Index data = 100GB RF=3 & SF=2 RF=3 & SF=2 RF=3 & SF=3
on 3 peer nodes on 6 peer nodes on 6 peer nodes
rawdata (15 GB daily) 15 * 3 = 45 GB 15 * 3 = 45 GB 15 * 3 = 45 GB
index files (35 GB daily) 35 * 2 = 70 GB 35 * 2 = 70 GB 35 * 3 = 105 GB
Total size across cluster 115 GB 115 GB 150 GB
Per Peer storage per day 115 / 3 = 38.3 GB 115 / 6 = 19 GB 150 / 6 = 25 GB
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 83 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Peer Failure Impact on Disk Sizing
• When a peer is lost, the 4 peers, replication factor = 3, search factor = 2 Data replication
P Primary (Origin) B Searchable backup R Rawdata-only
cluster master will initiate
additional replication Peer A Peer B Peer C Peer D
docs.splunk.com/Documentation/Splunk/latest/DistSearch/AboutSHC
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 85 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Search Scaling: Do you need a cluster?
• To mitigate performance issues as the search load increases
– Distribute search load
ê optimize scheduled searches to run on non-overlapping time slots
ê use job servers to isolate scheduled searches, real-time searches, and ad-hoc
searches
– Limit resource usage
ê limit the time range of end-user searches
ê configure user roles to limit the number of concurrent real-time searches
– Increase the number of peer nodes (indexers)
– Add more search heads
– Use a search head cluster
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 86 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Search Head Clustering
• Benefits
– Simple horizontal scaling Search Head External Load Balancer
Cluster
– Always-on search services
– Reliable alerts
– Consistent knowledge objects across cluster
– Eliminates the need for Job Server
Deployer
– No more NFS!
– Configuration sharing and job artifacts use
local storage via automatic replication
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 87 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Search Head Clustering Requirements
• Requirements
– At least 3 search heads
– A deployer
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 88 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Notes for Search Head Clustering
• Summary indexes must be forwarded to the indexer tier
– This is a general best practice, but required for search head clustering
– This may increase the disk space required on the indexers
• In search head cluster, add more search heads to horizontally scale the
capacity
– This assumes that the indexer layer has sufficient search capacity
– If capacity is insufficient, this will create a bottleneck at the indexers/peer nodes
– Remember that the rule of thumb for scaling is "add another indexer"
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 89 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Final Notes on Clustering
• Planning is essential
– What are the requirements for high availability and disaster recovery?
– How quickly must recovery occur?
– How many servers can be lost before service degrades or data is lost?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 90 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 4 Lab Exercise
Time: 10-15 minutes
• Task: Size the infrastructure
– How many indexers, search heads and other components?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 91 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 5:
Data Collection
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 92 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Review logging basics
• Understand the difference between Infrastructure
Opera'on
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 93 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
What Splunk Can Index
Splunk is the platform for machine data
Customer-Facing
Data Outside the
Data Center
Manufacturing
Click-stream data CDRs & IPDRs
Shopping cart data Power consumption
Online transaction data RFID data
Logfiles Configs Messages Alerts Metrics Scripts Changes Tickets GPS data
Virtualization
Linux / Unix And Cloud Applications Databases Networking
Windows
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 94 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Logging Components
Messages
Firewalls
Splunk Forwarder
(Relay) Events Indexer(s)
Switches
Linux Machine
Network
Without Forwarder
Inputs
https
(Originators)
Events
Network
Devices
Windows Machine Linux Machine
With Forwarder
With Forwarder
(Agent) (Agent)
Production
Mainframe Application
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 95 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data collection: agentless vs. agent
Agentless Agent
• Can use a variety of transports • Software program (service or
– TCP, UDP, SNMP, HTTP daemon) that collects information
• Splunk agentless options and pushes it over the network to a
central location
– HTTP data collection
– Splunk App for Stream
• Splunk agents
– Direct monitoring of TCP/UPD – Universal Forwarder
ports – Heavy Forwarder
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 96 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Collection Overview
• File based (monitor) inputs
– Using a shared file system as a log collector
These alternatives
– Using Splunk forwarder on the log collector are discussed in
• Network inputs detail on the
– Directly connected to single indexer following slides
– Collected on syslog-ng / rsyslog server with Splunk forwarder
• Windows events and perfmon (WMI)
– Local data collection with forwarder
– Remote data collection
• Special data collection options
– Splunk App for Stream
– HTTP Event Collector
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 97 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Alternatives for File Inputs – Log Collector
• Share log directory on Production Servers
using a Network File Server of your choice Single
Indexer
(agentless)
• Have Splunk indexer monitor these as local Local Directory
Monitoring Over
Production Servers
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 98 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Alternatives for File Inputs – Mixed
Production Servers
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 99 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Alternatives for File Inputs – Forwarding
• Universal Forwarders installed on Indexers
production servers
• Advantages
– Load balancing and buffering
– Supports scripted inputs
– More scalable
– Better performance Automatic load balancing
configured on all forwarders
– More secure
TCP / UDP
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 101 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Alternatives for Network Inputs – syslog
• Input to syslog, then Splunk monitors syslog output file
– Provides a buffering mechanism
• Can take advantage of syslog-ng / rsyslog filtering features
• A forwarder is installed on the syslog-ng / rsyslog server
– Provides the advantage of load balancing
• Offers greater resiliency and increases scalability
Forwarder installed on
Network Devices syslog-ng / rsyslog Indexers
Best Practice
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 102 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Alternatives for Windows Inputs – WMI Collector
• A Splunk forwarder running in a Windows forwarder performs
remote Event Log / WMI monitoring
• Will not scale horizontally to many production servers
• Can only monitor limited Windows inputs
Windows Heavy
Production Windows Servers Forwarder
Indexers
WMI
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 103 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Alternatives for Windows Inputs – Forwarder
• Each Windows production server has a Universal Forwarder installed
– Collecting performance monitoring locally and forwarding to indexer
– Scalable
– More secure and does not require domain privileges
– Can collect any input, not just Windows remote inputs
Best Practice
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 104 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Forwarding from Windows Systems
• Universal forwarder
(local agent)
Universal Remote Collector
– Offers many types of data sources Source
Forwarder (WMI)
– Minimizes Windows Event Logs Yes Yes
network overhead (must know name of
event log)
– Reduces operational risk
Performance Yes Yes
• Remote polling Registry Yes No
(agentless)
– Use native WMI API to collect
event logs and performance data
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 105 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Windows Forwarding – Method Comparison
Local Forwarder Remote Collector (WMI)
Performance • Requires less CPU • Requires more CPU
• Performs basic data • More network intensive
compression / encryption • Not recommended for highly audited hosts
• More memory-intensive (such as domain controllers)
Deployment • Easier if you have control of • Good for limited set of data
base OS build or many data • May be only option without build control or
sources admin rights on target host
• Can use deployment server
Possible • UF is limited to 256 kbps of • Starts to poll less frequently over time if
concerns network bandwidth unable to contact a host for a period of time
(to prevent potential DOS attack)
• Not advised for machines that are
frequently offline
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 106 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
HTTP Event Collector
• Component of Splunk Enterprise since 6.3
– Can be configured for high-volume inputs with multiple collection points
(indexers or heavy forwarders)
http://dev.splunk.com/view/event-collector/SP-CAAAE73
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 107 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk App for Stream
• Captures real-time streaming wire
data from anywhere in your
datacenter
• Filter and aggregate wire data
easily and automatically into
Splunk
• https://apps.splunk.com/app/1809/
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 108 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Comparison of Remote Collection Options
Best Universal • Reliable delivery, encryption, compression
Forwarders • Distribution and failover over multiple Splunk indexers
(UF) • Can execute scripts and collect program output remotely
• Defaults to 256 kbps of network bandwidth (configurable)
HTTP Event • Secure and scalable
Collector • Requires developers to code the HTTP post that will be indexed
Good Heavy • Same as UF except parses data before forwarding & no bandwidth limits
Forwarders • More load on source system
Not Optimal NFS / SMB • Not as scalable
• Possible latency with many files
syslog • Source info not as fine-grained
• Use rsyslog / syslog-ng instead for capture buffering, hostname extraction
╸ Makes TZ processing in Splunk better – no need to transform host
• Use a forwarder to read syslog files and distribute over indexers
Other [Including remote WMI]
Agentless • Not as scalable
• May be less secure
Not Recommended FTP, Batch • Not real-time
scripts • Can be hard to maintain and manage
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 109 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Managing Initial Data Collection
• Production servers often have large volumes of
historical data /var/log 1
Clean directory
• Log rotation
– Keep the latest two logs in text format before you compress, otherwise you may
be missing delayed events
• Use blacklist to eliminate compressed files from monitor input
– This prevents duplication of events
Best Practice
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 112 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 5 Lab Exercise
Time: 30 minutes
• Create a topology diagram for the environment
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 113 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 6:
Forwarders and Deployment
Management
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 114 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Review forwarder types
• Understand how to manage forwarder Infrastructure
installation in an enterprise
environment
Integra'on Collec'on
• Understand configuration of all Splunk
components using Opera'on
– Deployment Server
– Cluster Master
– Deployer Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 115 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Types of Forwarders
• Universal forwarder
– A streamlined binary package that contains only the components
needed to forward data
– Use the universal forwarder whenever possible
ê Smaller, more efficient
ê Network bandwidth defaults to 256 kbps (configurable)
• Heavy Forwarder
– Uses the Splunk Enterprise binary, with all capabilities
– Parses data before forwarding, so use a heavy forwarder when
ê routing events
ê filtering more than 80% of incoming events
ê anonymizing or masking data before forwarding to indexer
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 116 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Forwarder – Scripted Installation
• Use an automated method to install Splunk forwarders
– Scripted installation
– Another server management / configuration management tool
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 117 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk Deployment Tools
Search heads
Search-time
configurations
Deployer Indexers
Indexing &
App parsing
configurations
Cluster Master
Input-time
configurations
Deployment
Server
Forwarders
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 118 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment Management
Type of Instance Manage Configurations with
Forwarder Deployment Server or other
Stand-alone Search Head configuration management tool
Stand-alone Indexer
Search Head Cluster Member Deployer only
Peer Node in Indexer Cluster Cluster Master only
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 119 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment Server
• A centralized configuration manager that Deployment
Server
delivers updated content to deployment clients
Deployment
– Units of content are known as deployment apps Apps
$SPLUNK_HOME/etc/system/local
on clients
ê system-level configurations on clients cannot be
over-ridden with Deployment Server Forwarders
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 120 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment Server – Rules of Thumb
• One reference server safely handles approximately 2000+ polls/minute
• Be sensitive to the phoneHomeIntervalInSecs attribute in
deploymentclient.conf
– Default client poll is 60 seconds
– Prioritize servers that need more frequent updates Deployment
Server
– In general, adjust client polling interval to scale
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 121 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Config Management – Deployment Server
• When using Forwarder Management or the Deployment Server (DS)
– Build an install script/package for clients with only the files needed to contact the
DS (basic installation + deploymentclient.conf)
– Clients will get the rest of the configuration information from the DS
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 122 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deploy Apps to Search Head Cluster
• Use the Deployer to distribute apps to SHC members
– DO NOT stage any out-of-the-box apps in deployer (search, launcher, etc.)
– All members of the cluster will receive the same apps
• You cannot use the Deployment Server to distribute apps to SHC members
• Deployer supports both push and pull mechanisms
– Can push apps to search head cluster members
– Can be polled by new or restarted search head cluster members for updates
• You cannot use the Deployment Server to distribute apps to the peer nodes
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 124 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment Apps – Best Practices
• Design your deployment app
– An app is a set of deployment content (a configuration bundle)
– An app is deployed as a unit and should be small
– Take advantage of Splunk’s configuration layering
– Use a naming convention for the apps
– Create classes of apps, for example
ê input apps
ê index apps
ê web control apps
• Carefully design apps regardless of your configuration Best Practice
management tool (DS, cluster master, Puppet, etc.)
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 125 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 6 Lab Exercise
Time: 15 minutes
• Task: Analyze an app
– Remember that some Splunkbase apps may need to be repackaged into
deployment apps for your environment
• Task: Design a test environment
– How many servers are needed for testing?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 126 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 7:
Data Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 127 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Identify the 6 things that you must get
right at index time Infrastructure
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 128 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Laying the Foundation During Index Time
Six things you must get correct at index time
Input Phase Parsing Phase => Index Time
Date /
timestamp no Yes
extraction
Event
no Yes
boundary
host
sourcetype Yes, optimally set
Yes, can override input setting
source during input
index
This metadata cannot be changed after the events are written to the index
docs.splunk.com/Documentation/Splunk/latest/Admin/Configurationparametersandthedatapipeline
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 129 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Comprehension Tasks
• Oversight of knowledge object creation and usage across teams,
departments, and deployments
• Normalization of event data
• Creation of data models
• Management of shared knowledge objects
• Development of naming conventions
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 130 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Enrichment – Splunk Knowledge
Data interpretation Fields and field • The first order of Splunk Enterprise knowledge
extractions • Automatic field extraction brings meaning to
your raw data
• Fields that are extracted manually expand and
improve upon this layer of meaning
Data classification Event types and • Event types group sets of events that have
transactions common characteristics
• Transactions are collections of conceptually-
related events
Data enrichment Lookups and • Enrich your data with external resources
workflow actions • Add fields to data from external data sources
• Workflow actions enable interactions with other
applications or web resources
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 131 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Enrichment – Splunk Knowledge (cont.)
Data Tags and aliases • Normalize your data
normalization • Group a set of related field values together
• Apply aliases to fields with the same meaning,
but different names
• Give extracted fields tags that reflect different
aspects of their identity
Data models Data models • Representations of one or more datasets
• Drive Pivot tool
• Designed by knowledge managers who fully
understand the format and semantics of their
indexed data
• Can be used with other knowledge objects
including lookups, transactions, search-time field
extractions, and calculated fields
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 132 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Lookups
• Four built-in types
– File-based uses a csv file, stored in the lookups directory
ê preferred for small tables that change infrequently
– KV Store requires collections.conf that defines fields
ê uses mongodb to store the data
ê can be used to store state information and individual entries can be updated
ê preferred for large tables or data that needs to be regularly updated
– External uses a python script or a executable in the bin directory
– Geospatial uses a kmz saved in the lookups directory to support the choropleth
visualization
• Splunk DBConnect
– Can lookup data in a relational database
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 133 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Enrichment – Splunk DB Connect
MS SQL Other
Oracle DB Server Databases
DeployDBX/AboutSplunkDBConnect
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 134 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Normalizing Data
• Events from different products and vendors have unique formats and fields
– Even when the events are semantically equivalent
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 135 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Normalization – Example
This example illustrates normalized field names across two separate data
sources
Data Source 1
Data Source 2
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 136 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Common Information Model
• The Splunk Common Information Model (CIM)
– Defines relationships in the underlying data using tags and fields
– Provides a standard method of parsing, categorizing, and normalizing event data
– Leaves the raw machine data intact
– Defines the least common denominator for a domain of interest
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 137 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
CIM Version For Apps
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 138 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Models
A data model provides a hierarchical mapping of semantic knowledge about
one or more datasets
Pivot Data
Index sourcetype
Editor Model host
source
customer customer fields
order order Index tags
shipment shipment eventtypes
transaction transaction etc.
End user etc. etc.
Index
docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 139 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Model Design
• Good data model design requires
– An understanding of the data indexed in Splunk
ê indexes, fields, sourcetypes, etc.
ê understanding of the data content
– Appropriate fields and knowledge objects to enrich the data
– An understanding of the user needs
ê what are the questions that the user wants to answer?
ê what are the parameters of the model – from the user or business point of view?
• Simply exposing the existing sourcetypes and fields to the user is generally
not sufficient
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 140 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 7 Lab Exercise
Time: 15 minutes
• Task: Identify sourcetypes
– Which inputs have pre-trained sourcetypes?
– Which inputs are defined by apps?
• Challenge
– Examine fields in the CIM for network traffic
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 141 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 8:
Search Considerations
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 142 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Understand MapReduce
• Discuss search performance Infrastructure
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 143 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Components – MapReduce
• A framework for processing parallelizable
1
problems across huge datasets using a
large number of computers (nodes) in a 2
cluster
en.wikipedia.org/wiki/MapReduce 3
…
n
Source: Wikipedia
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 144 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Performance Considerations for MapReduce
• Which searches will MapReduce well? Which do not?
– This is determined by how much of the work can be done on the indexers vs.
how much manipulation of the search results must be done by the search head
Event searches and Reporting searches
non-distributable commands
Search Head Search Head
• Indexers retrieve events in • Indexers can perform
parallel event retrieval and
• Efficient for event searches additional steps in
• For other searches parallel
- Additional steps, if • Intermediate results
Intermediate
Events Events needed, must be done Results are returned
on the search head • The search head
- CPU & memory combines and
bottlenecks can occur presents the results
on the search head
Indexer Indexer Indexer Indexer
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 145 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Questions for Search Types and Volume
What types of searches • Free-text search
do you expect? • Search by fields
• Complex correlations and transactional searches
What types of alerts do • Alerts and notifications of specific events
you expect? • Alerts and notifications of thresholds reached or changed
• Alerts and notifications of complex thresholds
Reporting • How frequently will the reports run?
• How current must the report data be? Near real time, to an hour, to
the previous day?
• Will historical or retroactive reports be required, or only current
reports?
• How will the reports be delivered to users? Via email, web interface,
or other method?
• Are there users who do not have access to the raw data who must
see the reports?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 146 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Raw Event Searches
• These search queries contain only the search command
– The results are usually a list of raw events
– Usually categorized as super-sparse or rare type searches
– Correlating events
– Investigating security issues
Indexer Indexer
– Analyzing failures
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 147 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Report Generating Searches
• These search queries consist of initial search followed by some statistical
calculation
Search Head
– The result is usually a table, graph or a single value
– Often categorized as dense or sparse type searches
• Examples include:
– Compiling a daily count of error events Indexer Indexer
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 148 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Search Performance – Types of Searches
Type of search Description Indexer throughput Impact
A large percentage of the data Up to 50K matching
Dense CPU bound
matches the search events per second
A small percentage of data Up to 5K matching
Sparse CPU bound
matches the search events per second
A "needle in a haystack" search
Indexer must check all buckets to
Up to 2 seconds per Primarily I/O
Super-sparse find results
bucket bound
Time consuming for large numbers
of buckets
Similar to super-sparse searches,
10 – 15 buckets per Primarily I/O
Rare but bloom filters eliminate buckets
second bound
that don't include search results
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 149 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Improving Search Performance
• Best practice
– Make sure disk I/O is as good as you can get
– Increase CPU hardware only if needed
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 150 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Improving Performance with limits.conf
• If you have unused CPU/memory resources,
you can set multiple search pipelines
• Indexed real-time significantly limits.conf on indexer
improves performance if there [search]
are many real-time searches batch_search_max_pipeline = 2
Time
Search Slots = 4
S1 ✔ ✔ Run
S2 ✔ X Deferred
S3 ✔
S4 ✔
S5 xxxx ✔ ✔ ✔ ✔
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 153 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
How a scheduled job gets skipped
Window 1 Window 2
S1 ✔ ✔
S2 ✔ ✔
S3 ✔ ✔ ✔ Run
S4 ✔ ✔ x Deferred
S5 xxxx ✔ ✔ ✔ ✔ xxxxxX ✔ X Skipped
• Every scheduled job has a window – a time period where it should run
• A job is deferred if it can’t start when scheduled
– A deferred job is retried (repeatedly for the duration of its window)
– Example: Job S5 is deferred 4 times, but then runs during Window 1
https://wiki.splunk.com/Community:TroubleshootingSearchQuotas
Priority = Next_runtime
+ Avg_runtime x priority_runtime_factor
- skipped_count x period x priority_skipped_factor
+ window_adjustment
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 155 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Summarization and Time Value of Data
• The value of data can be broken into two curves
– Value of and individual data item
– Aggregate data value
• In most cases, the value of an individual data item will decline over time and
the aggregate data value will increase in value
Data Value
Age of Data
Time
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 156 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Acceleration
Report • Accelerates individual reports
Acceleration • Uses automatically-created summaries to speed completion times for qualified reports
• Easier to create than summary indexes and backfills automatically
• Data is stored on the indexers, alongside the related buckets
• Depending on the defined time span, periodically ages out data
• Always ages data when the corresponding bucket ages out
• Cannot create a data cube and report on smaller subsets
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 157 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
How Acceleration Works
• Report acceleration and summary indexing speed up individual searches,
on a report-by-report basis
– This is accomplished by building collections of pre-computed search result
aggregates
• Data model acceleration speeds up reporting for the specific set of attributes
(fields) that you define in a data model
– Creates summaries for the specific set of fields, accelerating the dataset
represented by that collection of fields rather than a particular search
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 158 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Model Acceleration
• Used to speed up retrieval of the events that underlie a data model
• Speeds up reporting for the entire set of attributes (fields)
• Updated every five minutes
• Affects only event object hierarchies
– Hierarchies based on search or transaction objects cannot be accelerated
• Most efficient if the root event objects include the index(es) in their
initial constraint search
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 159 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Data Model Acceleration
• The High Performance Analytics Store
– Consists of time-series index files, with the .tsidx file extension
– Exists on the indexer tier, parallel to the buckets that contain the events
referenced in the .tsidx files
– Spans a "summary range," which is the time range selected for acceleration
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 160 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Disk Usage: Data Model Acceleration
• Estimated storage for data model acceleration per year =
Data volume per day * 3.4
– Example: if indexing 250 Gb / day, the expected storage needed for acceleration:
250 * 3.4 = 850 GB across all indexers
• To see the size on disk of a particular data model acceleration,
go to the Data Models page and click Acceleration
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 161 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Report Acceleration
• Report Acceleration
– Report results are "pre-computed" at regular intervals and stored on the
indexer tier
– All reports that use a similar set of data and computations automatically use
the same report acceleration summaries if they can
• Report Acceleration Summaries
– Span a time range selected for acceleration
– Update every 10 minutes by default
– Are automatically rebuilt if the data in the underlying bucket changes
docs.splunk.com/Documentation/Splunk/latest/Knowledge/
Manageacceleratedsearchsummaries
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 162 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Report Acceleration
• Requirements for acceleration
– The report was not created via Pivot
– The underlying search qualifies for acceleration
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 164 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Summary indexing overview
• Efficiently report on large volumes of data
• Spread the cost of a computationally expensive report over time
– Run reports over long time ranges for large datasets more efficiently
ê Example: Show the number of page views for each of your web sites over the past
quarter, broken out by site
• Retain summarized data after original data sources have been frozen
– Example: If you can only maintain original firewall logs for 30 days, but want
reports over a year
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 165 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
How summary indexing works
• Data is collected and summarized
– A populating search is scheduled to run periodically
– The populating search stores its results in a summary index
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 166 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Summary indexing flow
purchasedProducts search Populating Search
sourcetype=access_combined | sistats count by product_name
Run every hour, time range 1 hour
Populating
search results
are formatted
and stored as
index statistics
Summary
index
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 167 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Summary Indexing
• Requires manual management and knowledgeable monitoring
– Manual rebuild of summary statistics needed
ê If populating search fails to run on schedule
ê If data arrives late
ê If knowledge objects change
• Summary index configuration
– Must be defined in indexes.conf, just like a regular index
– Retention and size are controlled in indexes.conf
– Do not consume license (summary indexing is free)
– Summary indexes are usually small, but allocate space on the indexer tier
ê monitor the space used like any other index
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 168 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Forwarding Summary Indexes
• Summary indexes are created by default on the search head
• If you use summary indexes, forward your summary indexes to the
indexing layer
– This allows all members to access them
ê otherwise, they're only available on the search head that generates them
– This is required for search head clustering
Best Practice
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 169 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 8 Lab Exercise
Time: 10 minutes
• Review the list of searches. Could any searches benefit from acceleration?
• What additional disk space considerations have you uncovered?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 170 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 9:
Integration
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 171 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Describe integration methods
• Identify common integration points Infrastructure
Integra'on Collec'on
Opera'on
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 172 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Types of Integration
• Accessing other data or applications from within Splunk
• Accessing Splunk from another application
• Re-forwarding data to other applications after it is indexed in Splunk
• Integration with Hadoop – Hunk
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 173 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Search Extensibility Review
• Custom search commands
– Write a new search commands in Python 2.7 Extend & Integrate Splunk Build Splunk Apps
– http://dev.splunk.com/view/python-sdk/SP- SDKs
Web
Data Models Framework
CAAAEU2 Java
JavaScript
Python
Search
Simple XML
Extensibility
Ruby
• Scripted lookups
JavaScript
C# Modular
PHP Inputs
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 174 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Integration Using Alerts
• Custom alert actions
– allow developers to extend Splunk to build, package
and publish new alert actions for Splunk
– Some modular alerts have been published on
Splunkbase (for example, Twilio SMS Alerting)
– http://docs.splunk.com/Documentation/Splunk/latest/
Alert/CreateCustomAlerts
– docs.splunk.com/Documentation/Splunk/latest/
AdvancedDev/ModAlertsIntro
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 175 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Splunk REST API
• REST stands for
REpresentational State Transfer
Extend & Integrate Splunk Build Splunk Apps
Search
Simple XML Framework
HTTP requests
Python JavaScript
Extensibility
Ruby
C# Modular
PHP Inputs
• With over 200 endpoints, the REST API lets REST API
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 176 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Software Development Kits (SDKs)
• SDKs provide language-specific libraries for
accessing the Splunk REST API
Extend & Integrate Splunk Build Splunk Apps
– Includes documentation, code samples,
resources, and tools SDKs
Java
Data Models
Web
Framework
JavaScript Simple XML
ê Java
ê PHP
ê Ruby
ê C#
http://dev.splunk.com/sdkss
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 177 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Sending Data to Other Systems
Search Head
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 179 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 10:
Operations and Management
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 180 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module Objectives
• Identify ongoing tasks in a Splunk deployment
• Define monitoring tools Infrastructure
Opera'on
Querying Comprehension
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 181 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Operations and Management
• Operations
– Monitoring
– Backups and archiving
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 182 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Distributed Management Console (DMC)
• Splunk environment monitor and
diagnostic tool
– Only admins can access
– Investigate server performance
and resource usage
– Uses the _introspection index
populated by
introspection_generator_addon
– Provides configurable alerts
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 183 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Monitoring Splunk: S.o.S app
• The Splunk on Splunk (S.o.S) app
is another tool for monitoring or
troubleshooting
– View, search and compare configs
– Detect errors and inspect crash logs
– Measure indexing performance & event
processing
– View details of scheduler and user activity
– Gradually being replaced by DMC
http://docs.splunk.com/Documentation/Hunk/latest/Hunk/Setanarchivescript
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 186 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Onboarding New Data Inputs
• Use the Deployment Methodology as a guide
• What process does a user follow to index new data into Splunk?
• Once data is indexed, what search time changes need to be made?
Are there new knowledge objects that should be created?
• Who determines the ownership of data?
• How are reports created to support the new data?
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 187 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Onboarding New Users
• How is a user or user group given access to Splunk?
• What are the default apps?
• How is access to data controlled? What data do new users see?
• What training do new users need or receive?
• Are there limits on making objects (especially scheduling searches)?
Any quotas?
• Progressive privileges
– As users are trained or gain advanced skills, give them additional privileges
ê real time search, create dashboards, etc.
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 188 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Operations Review
• Periodically review scheduled searches
– Monitor jobs & review search logs to identify the expensive searches
– Check for inefficiencies and redundancies
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 189 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Operations – Security
• Check Splunk Security Portal frequently
– www.splunk.com/page/securityportal
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 190 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Module 10 Lab Exercise
Time: 5 minutes
• Email your lab document(s) to your instructor
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 191 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Resources and Additional Study
• Splunk documentation for architects
– Installation Manual (for hardware and capacity planning information)
– Distributed Deployment Manual
– Capacity Planning Manual
– Other manuals for specific topics
• Splunk Answers answers.splunk.com
• Splunk's public wiki (watch dates for outdated topics)
www.splunk.com/wiki/Deploy:Deployment_topics
– Deployment scenarios and other useful topics
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 192 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Resources and Additional Study
• Splunk documentation for architects
– Installation Manual (for hardware and capacity planning information)
– Distributed Deployment Manual
– Capacity Planning Manual
– Other manuals for specific topics
• Splunk Answers answers.splunk.com
• Splunk's public wiki (watch dates for outdated topics)
www.splunk.com/wiki/Deploy:Deployment_topics
– Deployment scenarios, performance info and other useful topics
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 193 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Support Options
• Community Support - Documentation, the Community Wiki,
Splunk Answers, Splunk Blogs, and IRC are available to anyone for
answers to basic technical questions. These resources are always available
to provide you with an immediate answer
• Enterprise Support - Direct access to our Customer Support team by
phone and the ability to manage your cases online
• Global Support - 24x7x365 support for critical issues, a dedicated
resource to manage your account and quarterly review of your
deployments
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 194 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
.conf2016: The 7th Annual
Splunk Worldwide Users’ Conference
Listen
Visit
Listentotoyour
yourdata.
conf.splunk.com
Generated
data. 195
for
for Igor Abreu (iabreu@realprotect.net) (C) more information
Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Thank You
Complete the Class Evaluation to be in this month's drawing for a $100 Splunk
Store voucher
1. Look for the invitation email, What did you think of your Splunk Education class, in your inbox
2. Click the link or go to the specified URL in the email
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 196 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Appendix:
Sample Topologies
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 197 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment – Centralized Topology
Indexers
Auto-load
Balancing
(in outputs.conf on forwarders)
Intermediate
Forwarders Forwarder
Syslog Devices
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 198 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment – Decentralized Topology
Indexers Indexers Search Head
Forwarders
Forwarders
Indexers Intermediate
Forwarder
Search Head Indexers
Forwarder
Network Devices
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 199 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Deployment – Hybrid Topology
Indexers
Search Head Indexers
Forwarders
Forwarders
Indexers Intermediate
Forwarder
Search Head Indexers Search Head Cluster
Forwarder
Indexers Network Devices
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 200 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016