Architecting and Deploying Splunk 6.4

Architecting and Deploying Splunk
Generated for Igor Abreu (iabreu@realprotect.net) (C) Splunk Inc, not for distribution
Architecting and Deploying Splunk 6.4
Listen
Listentotoyour
yourdata.
data. 1 Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Document Usage Guidelines
• Should be used only for enrolled students
• Not meant to be a self-paced document, an instructor is required
• Please do not distribute
13 May 2016
Listen
Listentotoyour
yourdata.
Course Prerequisites
• Required
–  Using Splunk
–  Creating Splunk Knowledge Objects
–  Splunk Administration
• Strongly Recommended
–  Searching and Reporting with Splunk
–  Advanced Dashboards and Visualizations
Listen
Listentotoyour
yourdata.
Course Goals
• Understand best practices for architecting and documenting Splunk
Enterprise distributed deployments
• Provide a framework and methodology for Splunk deployments
Listen
Listentotoyour
yourdata.
Course Outline
1.  Introduction
2.  Initial Requirements Definition
3.  Apps and Index Design
4.  Infrastructure Planning
5.  Data Collection
6.  Forwarders and Deployment Management
7.  Data Comprehension
8.  Search Considerations
9.  Integration
10.  Operations and Management
Listen
Listentotoyour
yourdata.
Module 1:
Introduction
Listen
Listentotoyour
yourdata.
Module Objectives
• Introduce the Splunk deployment planning process and tools
• Review the structure of this course
Listen
Listentotoyour
yourdata.
Deployment Scaling
A deployment plan creates a solid foundation
• to scale Splunk deployments as they evolve
Expansion
• to implement large, enterprise deployments
Enterprise
Workgroup Deployment
Download
•  Enterprise standard
•  Large number of users
•  Many different use cases
•  More sites
•  Many different users
•  More geographies
•  Specific use case •  More data sources
•  Initial User •  Specific users •  More data volume
Listen
Listentotoyour
yourdata.
Splunk Deployment Methodology
• Provides a structured path to implement Splunk in an enterprise
• Identifies the people and resources required, along with responsibilities
• Provides a task checklist and ordering
•  It consists of six phases: Infrastructure
1.  Infrastructure - ensure Splunk has the proper resources

(servers, sizing, topology, disk, security, retention)
Integra'on Collec'on
2.  Collection - ingest the correct data into Splunk Opera'on
3.  Comprehension - tell Splunk about the data

4.  Querying - get information out of Splunk
5.  Integration - provide search results to a wider audience Querying Comprehension
6.  Operation - keep it operating, grow it, evolve it

Listen
Listentotoyour
yourdata.
What is in a deployment plan?
• A deployment plan should include
–  Deployment Goals –  Suggested Splunk Apps
–  User Roles / Staffing List –  Report Definitions
–  Existing Topology Diagrams –  Education / Training Plan
ê  physical environment –  Deployment Schedule
ê  logging
–  Splunk Deployment Topology
–  Data Source Inventory
–  Data Policy Definition
Listen
Listentotoyour
yourdata.
Module 1 Lab Exercise
Time: 10 minutes
• Task: Read the lab document for Module 1 (pages 1-4)
–  DeployingSplunk64_Labs.pdf
–  If you have difficulty seeing the graphics in the lab document, they also appear on
the following three pages
• Task: Download planning tools (optional)
–  Data Source inventory (data_inventory.rtf)
–  Index Planning and Sizing (data_sizing.xlsx)
–  Splunk Icon Library (splunk_icons.pptx)
–  Use Case checklist (use_case.pdf)
Listen
Listentotoyour
yourdata.
Current IT Environment
Listen
Listentotoyour
yourdata.
Current Logging Environment - Corporate
Listen
Listentotoyour
yourdata.
Current Logging Environment - CoLo
Listen
Listentotoyour
yourdata.
Module 2:
Initial Requirements
Definition
Listen
Listentotoyour
yourdata.
Module Objectives
• Identify the information that is needed for deployment decisions
• Provide lists and resources to aid in collecting requirements
Listen
Listentotoyour
yourdata.
Key Planning Information
• Identifying requirements is a first step in the deployment
–  More information will be uncovered throughout the phases of deployment
• Gather raw material for the deployment plan at the beginning, if possible
–  Overall goals for the deployment
–  Key users, including their goals and use cases
–  Current environment
ê  physical environment
ê  monitoring tools in use
–  Data sources
Listen
Listentotoyour
yourdata.
Overall Goals
• Who is the sponsor?
•  In addition to the sponsor, who else must be satisfied?
• What are the goals for the deployment?
• How will success be measured?
Listen
Listentotoyour
yourdata.
Use Cases
• What will the users do with Splunk?
–  What tasks will they perform?
–  What reports will they generate?
• What other use cases may apply?

• For ideas and inspiration:
–  Why Splunk? Customer success stories, whitepapers, etc.
http://www.splunk.com/view/SP-CAAAG2Q
–  Use Case Checklist handout
use_case.pdf
–  Splunk apps
http://splunkbase.splunk.com
Listen
Listentotoyour
yourdata.
Current IT Environment
• What is the current overall IT topology?
–  Data centers
–  Network zones
–  Number and type of servers
–  Location of users
• Obtain a network diagram

–  Are there security restrictions among data center and network zones?
–  What is the bandwidth among data center and network zones?
• Authentication
–  What system is in current use?
Listen
Listentotoyour
yourdata.
Current Logging Environment
• Are any logs or other data sources collected or centralized today?
–  Logged to SAN / NAS devices
–  Centralized via syslog, syslog-ng, rsyslog, Kiwi, Snare, etc.
–  Parsed and stored in a SQL database
• What tools are in use?

–  Are there log parsing / scraping tools or scripts in place?
–  Are there any query tools in place? Who uses each tool?
–  Are logs a source for a monitoring system?
–  What ticketing system is in place?
• Is there an expectation that Splunk will integrate with or replace existing tools?
Listen
Listentotoyour
yourdata.
General Requirements
• Security
–  What security policies may affect the collection, retention, and reporting of data?
–  What approvals will be needed?
• Regulatory
–  Are there laws, regulations, or policies that affect the data collection, reporting,
etc.?
• High Availability or Disaster Recovery
–  Is data replication required?
Listen
Listentotoyour
yourdata.
Data Sources
• Data Source Inventory
–  What is the superset of data sources needed by all users?
–  How much data is generated per day?
• Data Policy
–  How long should each data source be retained?
–  Who can see particular data elements?
–  What data needs protection against tampering?
–  What proof of integrity needs to be provided?
–  Will Splunk be the primary repository for data?
• Handout
–  data_inventory.rtf
Listen
Listentotoyour
yourdata.
Example Data Source Inventory
Data Source: Firewall logs
Location and Type: UDP:514
file input (name and path), API,
network port, device, etc.
Ownership and Access: Network team owns; Security can also access
Data Volume
Average per day: 500 MB
Peak: 6000 EPS at ~100 bytes/event
Retention: 6 months
Format or sourcetype: syslog?
Collection method: How will this data get to the indexers? Is there
a collection point at each location?
Listen
Listentotoyour
yourdata.
Time: 15 minutes
• Task: Complete the abbreviated data source inventory
–  You may not know enough to supply answers for all data sources
–  Part of the inventory has been completed for you
ê  Do you agree with the completed portion?
ê  Do you have questions for the sponsor?
Listen
Listentotoyour
yourdata.
Module 3:
Apps and Index Design
Listen
Listentotoyour
yourdata.
Module Objectives
• Understand index implementation
• Design indexes Infrastructure
• Understand how to estimate storage

requirements for indexes Integra'on Collec'on
• Identify relevant apps Opera'on
–  Document impact on inputs and

indexes
Querying Comprehension
Listen
Listentotoyour
yourdata.
Splunk Indexing Review
• Optimized for quickly indexing and persisting unstructured data
• Minimal schema for persisted data
–  An event consists of raw text, timestamp, source, source type and host
–  This is called "rawdata" and is stored in a compressed file
• Once data enters Splunk, it is

–  Parsed for line-breaking, timestamps and other metadata fields
–  Persisted in its raw form
–  Indexed by the metadata fields, along with all the keywords in the raw event text
ê  Splunk uses an "inverted index" structure for the .tsidx files
Listen
Listentotoyour
yourdata.
Splunk Indexing Review (cont.)
• Additional processing on raw events is deferred until search time
• This serves 4 important goals
-  indexing speed is increased minimal processing is
performed during indexing
-  bringing new data into the no schema planning is needed
system requires less effort
-  the original data is persisted
for easy inspection
-  the system is resilient to data parsing problems do not
change require reloading or re-indexing
of the data
Listen
Listentotoyour
yourdata.
Disk Storage Details – Files
• As Splunk indexes your data, it creates two main types of files:
–  rawdata contains the original data in a compressed form
ê  for "syslog-like" data, this may be 10% to 15% of the raw pre-indexed data
ê  size is dependent on how well data compresses
–  Index files (.tsidx) which point to the unique terms
ê  these files range in size from approximately 10% to 110% of the rawdata size
ê  size is strongly affected by the number of unique terms in the data
ê  adding indexed field extractions will increase the size of the .tsidx files
• Buckets contain both rawdata and index files
–  Exception: replicated buckets may have only rawdata
Listen
Listentotoyour
yourdata.
Types of Index Files
  Inverted index (*.tsidx)
  Metadata (*.data)
–  Information about sources, sourcetypes and hosts of the events
contained in each bucket
  Bloom filters (bloomfilter)
–  Efficient data structure that authoritatively rules out buckets
ê  Provides 100% certainty that a search term is not present in a bucket
–  Consulted by every search
–  Very fast (1-2 IO)
https://en.wikipedia.org/wiki/Bloom_filter
Listen
Listentotoyour
yourdata.
Inverted Index
• Index data structure that maps from content
(such as words or numbers) to its locations red sweater
en.wikipedia.org/wiki/Inverted_index
blue
• An inverted index allows fast full text searches pants
red blue sweater
• Splunk .tsidx uses an inverted index sweater
structure to map keywords to their locations in
the rawdata red pants
Source: Wikipedia
Listen
Listentotoyour
yourdata.
Splunk Index Files
IDX 3
IDX 2
IDX 1
Source / Sourcetype / Host Metadata
1 source : : /my/log 100 et lt it
2 source: : /blah 150 et lt it
Home Path
hot_v1_100
TSIDX
*.data
*.tsidx apple beer coke LEXICON
rawdata cream ice java …
hot_v1_101 apple POSTING

beer
db_lt_et_101
Rawdata
Cold Path
db_lt_et_80 "apple pie and ice cream
are delicious"
"an apple a day keeps
Thawed Path doctor away"
db_lt_et_70
Listen
Listentotoyour
yourdata.
Managing disk usage with tsidx reduction
• Full index files are normally kept with the rawdata until the bucket is frozen
• To save disk space, you can enable tsidx reduction in indexes.conf
–  Based on an age setting
ê  timePeriodInSecBeforeTsidxReduction = 7776000 (90 days)
–  Replaces some index files with mini-index files
–  Removes merged_lexicon.lex
–  Overall, the bucket size is typically reduced by 1/3 to 2/3
• Searches will take much longer over reduced buckets

ê  Searches for rare terms are particularly affected
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Reducetsidxdiskusage
Listen
Listentotoyour
yourdata.
Estimating Indexing Volume
• Estimate Input Volume (Data Capacity)
- Verify raw log sizes
- Daily, peak, retained, future volume
- Total number of data sources
- Total number of hosts
- Add volume estimates to data source
inventory / spreadsheet
• A rule of thumb for syslog-type data is that once it has been compressed
and indexed in Splunk, it occupies approximately 50% of its original size:
–  15% for the rawdata file
–  35% for associated index files
Listen
Listentotoyour
yourdata.
Testing Indexing Compression
• Confirm estimates with actual data
–  Create a baseline with real or simulated data
–  Find compression rates
–  Leverage _internal index metrics to determine actual input volumes
• To test specific types of data

–  Index a sample of data, then check the sizes of the resulting directories in
defaultdb
–  See documentation for step-by-step space estimation method:
http://docs.splunk.com/Documentation/Splunk/latest/Capacity/
Estimateyourstoragerequirements
Listen
Listentotoyour
yourdata.
Sizing Indexes Example
Assuming
•  No replication
•  Raw data compression is 15% of daily volume
•  Index file compression is 35% of daily volume
Input Daily Retention Visibility Estimated Index

Volume Size on Disk
access.log 350 GB 60 days Web, Security 10.5 TB web
secure.log 175 GB 90 days Security, Ops 7.9 TB ops
ecommerce.log 125 GB 6 months Security, 11.3 TB ecommerce
(180 days) Customer
Service
system.log 250 GB 90 days Security, Ops 11.3 TB ops
Total 900 GB 41.0 TB
Listen
Listentotoyour
yourdata.
Splunk Apps
• Which Splunk apps are appropriate?
–  Apps are often chosen based on
ê  devices or technologies in the production environment
ê  use cases
ê  inputs
• Apps often have particular infrastructure
requirements
• Most apps are free http://www.splunkbase.com
• While apps can always be added later, plan up

front if possible
Listen
Listentotoyour
yourdata.
Apps and Indexes
• Many apps define inputs and indexes
• Check apps against your initial index design and data source inventory
–  You may need to acquire new data sources to support the app
ê  add them to the inventory
–  Identify the indexes created by the app
–  Identify which inputs will map to which indexes
ê  there should be no overlap or conflicts
ê  each data source goes to a single index
ê  include any sourcetype information that is defined in the app
Apps must be integrated with the overall index and inputs
Listen
Listentotoyour
yourdata.
Key Criteria for Index Design
• When to partition data into different indexes
–  Retention
ê  How long should the data be kept? Online? Offline?
ê  All retention settings apply on a per-index basis
ê  All data sources within an index should have the same retention
–  Access
ê  Who can search against the data?
ê  All data sources within an index will have the same visibility
–  access to indexes is controlled by role
40 Listen
Listentotoyour
yourdata.
Additional Design Criteria
• Search Performance
–  Less common to differentiate here, but may be important
–  Mixed data = mixed lexicon, larger number of keywords, therefore larger index
–  Things searched together can be indexed together
–  Example:
ê  Both a high-volume / high-noise data source and a low-volume data source feed into
the same index
ê  If you search mostly for events from the low-volume data source, the search speed
will be slower than necessary
ê  To improve performance, create a separate index for the high-volume data source
Listen
Listentotoyour
yourdata.
Time: 15 minutes
• You may want to use the spreadsheet data_sizing.xlsx
• Task: Estimate the disk space required
• Task: Index Design
–  Identify the indexes
–  Estimate the Splunk license required for indexing
• Challenge: Identify potential apps

–  Identify apps for the environment
–  Add any index and input information from the apps to your solution
Listen
Listentotoyour
yourdata.
Module 4:
Infrastructure Planning
Listen
Listentotoyour
yourdata.
Module Objectives
• Define environment requirements
• Understand sizing factors for servers Infrastructure
• Understand reference servers

• Understand the planning impact of Integra'on Collec'on
clustering Opera'on
Listen
Listentotoyour
yourdata.
Foundation for Splunk Deployment
• A low latency network
Standard  
Distributed  Internet  
Search Connection
–  1 Gb is the minimum bandwidth

High Bandwidth, Low Latency
Indexers
• A solid enterprise wide time infrastructure
–  NTP for time synchronization
• A solid Domain Name Service (DNS) Standard  

Internet Connection
–  Splunk can significantly increase load on DNS Forwarders
• Turn off Transparent Huge Pages (THP)

–  THP is a feature on some Linux distros
Listen
Listentotoyour
yourdata.
Basic Sizing Considerations
  Amount of incoming data

  Amount of indexed (stored) data
  Number of concurrent users
  Number of scheduled searches
  Types of searches
  Specific Splunk apps
–  Can affect incoming and indexed data
–  Can affect number of concurrent searches
Listen
Listentotoyour
yourdata.
How Apps Affect Infrastructure Sizing
•  Review the documentation for each app
–  Identify any impact on the Splunk deployment
•  For example
–  Splunk App for Enterprise Security places a heavy load on
search heads
–  Splunk App for VMware and Checkpoint LEA app both require a
heavy forwarder for data collection
–  More details about the Splunk App for Enterprise Security and
the Splunk App for VMware are on the following pages
Listen
Listentotoyour
yourdata.
Splunk App for Enterprise Security (ES)
• Infrastructure Impacts
–  A dedicated search head or search head cluster
–  One indexer per 100GB data indexed per day
ê  Assumes 15 correlation searches running
ê  Additional data storage for accelerated data models, estimated as
data indexed per day * 3.4
http://docs.splunk.com/Documentation/ES/latest/Install/Overview
Listen
Listentotoyour
yourdata.
Splunk App for VMware
–  This app adds approximately 300 MB of data per ESXi host per day to Splunk
–  Requires data collection nodes running as VMs in the VMware environment
ê  the data collection node is distributed as an Open Virtual Appliance (OVA) file
ê  each data collection node can collect data from 40 or fewer ESXi hosts, with a ratio
of 25 to 30 virtual machines per ESXi host
http://docs.splunk.com/Documentation/VMW/latest/Installation/Aboutthismanual
Listen
Listentotoyour
yourdata.
Splunk App for IT Service Intelligence (ITSI)
–  A dedicated search head or search head cluster
–  Additional infrastructure may be needed, depending on the number of Key
Performance Indicators (KPIs) that are tracked
ê  The documentation has recommendations
http://docs.splunk.com/Documentation/ITSI/latest/Configure/Performancerecommendations
Listen
Listentotoyour
yourdata.
Splunk User Behavior Analytics (UBA)
• Hardware requirements
–  50GB disk space for Splunk UBA installation
–  500GB additional disk space for metadata
–  16 CPU cores
–  64 GB RAM
http://docs.splunk.com/Documentation/UBA/latest/Install/Requirements
Listen
Listentotoyour
yourdata.
Splunk Components
What are the requirements for each type of Splunk instance?
Indexer Search Head Deployment License Heavy Cluster Search Head Deployer
Splunk Enterprise (Search peer) Server Master Forwarder Master Cluster
The heavy forwarder and universal forwarder

are discussed in a later module.
Sizing and other considerations for all other
components are discussed here.
Deployment Client
Universal Forwarder
Listen
Listentotoyour
yourdata.
Component Types – Indexer
• The Splunk component that indexes data, transforming raw data into
events and placing the results into an index
• It also searches the indexed data in response to search requests
Index Time Search Time
Index
Search Head
Listen
Listentotoyour
yourdata.
Indexer – Reference Server
Indexer
•  One reference indexer per 250 GB / day

ingestion rate in a distributed deployment
Hardware Intel 64-bit Xeon or equivalent chip
•  Assumes: architecture
–  Peak load is about 3x daily average CPU 2 x 6-core (12 cores total) at 2+ GHz per
core
–  Light searching for 2 or 3 concurrent users
Memory 12 GB RAM
•  Need additional servers for: Disk Disk subsystem capable of 800 IOPS
–  Increased reporting, searching, users (e.g. 8x15K RPM SAS drives in RAID 1+0
configuration)
•  Indexing alone can use 4 full cores at full load Network Standard 1 Gb Ethernet NIC
Optional 2nd NIC for management network
•  Each concurrent search needs a full core OS Linux or Windows - 64-bit version
–  More servers reduce search duration and
increase search throughput
Listen
Listentotoyour
yourdata.
Horizontal Scaling
• Adding indexers scales capacity Search
–  Allows a greater daily indexing volume Head
–  Speeds searches
• When using multiple indexers
–  Use Splunk’s built-in forwarder load
balancing Indexers
–  Use distributed search

• When in doubt, the first rule of scaling is
to add another commodity indexer
Auto Load  
Balanced  
Forwarders
Listen
Listentotoyour
yourdata.
Increasing Indexer Utilization
• Normally, the indexer uses approximately 4-6 cores for indexing
–  Possibly underutilizing the indexer hardware
• You can configure multiple pipeline sets to increase hardware utilization

–  Available since 6.3
splunkd
splunkd
Single pipeline set

Multiple pipeline sets
Listen
Listentotoyour
yourdata.
Disk Storage Details – About IOPS
• Input/Output Operations Per Second (IOPS) measures disk throughput
–  Hard drives read and write at different speeds
ê  average IOPS is the blend between read and write speed
–  To get the most IOPS, choose drives with
ê  high rotational speeds
ê  low average latency and seek times
• Splunkbase has an app to analyze disk performance using Bonnie++
https://splunkbase.splunk.com/app/3002/
• Links to additional information on determining IOPS
www.symantec.com/connect/articles/getting-hang-iops-v13
www.cmdln.org/2010/04/22/analyzing-io-performance-in-linux
Listen
Listentotoyour
yourdata.
Disk Storage Details
• Splunk indexing is read / write intensive
• Recommended RAID setup is RAID 10 (1 + 0)
–  Provides fast reads and writes with the greatest redundancy
–  Do NOT use RAID 5 since the duplicate writes make it too slow for high
performance indexing
Device Type IOPS Interface

7,200 RPM SATA drives HDD ~75 - 100 SATA 3 Gb/s
10,000 RPM SATA drives HDD ~125 - 150 SATA 3 Gb/s
10,000 RPM SAS drives HDD ~140 SAS
15,000 RPM SAS drives HDD ~175 - 210 SAS
Source: Wikipedia
Listen
Listentotoyour
yourdata.
Disk Storage Details – SAN and NAS
• Storage Area Networks (SAN) and Network-
attached Storage (NAS)
–  Not a good choice for hot and warm buckets (db)
defaultdb
–  Suitable for cold buckets (colddb)
–  searches over long time periods that utilize colddb
will be slower
• High performance SAN can be used in
environments with high speed networking and db 
(hot and warm buckets)
colddb 
(cold buckets)
low latency
• IOPS and reliability are the definitive factors
Listen
Listentotoyour
yourdata.
Storage Requirements
• The disk storage needed on an indexer is primarily determined by the size
of all the indexes, plus base storage for the OS and configuration files
• Additional disk storage is needed for
–  Indexer Clustering
ê  data replication
ê  discussed later in this module
–  Summarization and Acceleration
ê  data models
ê  report acceleration
ê  summary indexing
ê  discussed in the Search Considerations module
Listen
Listentotoyour
yourdata.
Virtualizing Splunk
• You can virtualize any Splunk instance if you meet the minimum resource
requirements
• Tips for virtualization
–  Use locally attached volumes; direct access to a dedicated volume is best
–  Separate search head VMs from indexer VMs, as they burst CPU together
–  Ensure that sufficient resources are reserved – do not overcommit CPU or
memory for Splunk VMs
–  Expect virtualization to reduce performance by 10% or more
• Virtualization tech brief contains details

www.splunk.com/content/dam/splunk2/pdfs/technical-briefs/splunk-deploying-vmware-tech-brief.pdf
Listen
Listentotoyour
yourdata.
Component Types – Search Head
• The search head handles search
management functions
–  Directs search requests to a set of search Search Head
peers and then…

–  Federates or merges the results and Requests Responses
presents them to the user

Search Peers
Indexers
Listen
Listentotoyour
yourdata.
Search Head – Reference Server
• 1 per 10 - 12 concurrent jobs / queries Search Head
–  1 active job per core
• Search heads mostly aggregate results

–  Some types of searches may create Hardware Intel 64-bit Xeon or equivalent chip
architecture
bottlenecks
CPU 4 x 4-core (16 cores total) at 2+ GHz
• Scale by adding indexers per core
Memory 16 GB RAM
• Indexers do the majority of the work Disk 4 x 300 GB, 10K RPM SAS hard disks,
configured in RAID 1+0
–  More indexers => Network Standard 1 Gb Ethernet NIC
jobs run faster => Optional 2nd NIC for Management Network
fewer concurrent jobs => OS Linux or Windows - 64-bit version
better search throughput

Listen
Listentotoyour
yourdata.
Sizing Factors for Searching
• Use cases determine search needs
–  Number of searches (concurrent & total)
–  Number of users (concurrent & total)
• Types of searches
–  Reporting, dashboard, ad-hoc
• Jobs
–  Summarization, Alerting, Reporting
• Apps may increase the search load

–  Real time and/or scheduled searches may be features of apps
• Plan for expansion as adoption grows

Listen
Listentotoyour
yourdata.
Component Types – DMC
• Distributed Management Console (DMC) DMC
–  Splunk Enterprise's monitoring tool Admin
–  Provides detailed topology and

performance information
• A specialized search head Search Head Cluster Cluster Master
–  For sizing: follow the reference server Deployment 
guidelines for search heads Server
–  Should not be a production search head

Forwarders
–  Can run on a shared instance with caveats

Indexers
–  Only admins should access

http://docs.splunk.com/Documentation/Splunk/latest/DMC/DMCoverview
Listen
Listentotoyour
yourdata.
Additional Component Types
License Master Allocates licensing capacity and manages licensing usage of
all instances
•  All indexers, search heads, cluster masters, deployers and
deployment servers should be slaves to a license master
•  Contacted over network frequently by license slaves
Deployment Server Manages Splunk configuration files for deployment clients
•  Operates on a "pull" model – clients "phone home" at
intervals over network
•  Can handle 2000 phone-homes/minute at default settings
Job Server A search head that only runs scheduled searches
•  Search head component sizing applies
Cluster Master Regulates the functioning of an indexer cluster
Deployer Distributes configurations to search head cluster members
Listen
Listentotoyour
yourdata.
Additional Components - Sizing
CPU Memory Disk Network Factors
License Master low low low 1 Gb number of slaves
Deployment med* med* low 1 Gb number of clients and
Server number of apps + config files
*CPU & memory can spike

during downloads
Cluster Master med med low 1 Gb required for indexer cluster
Deployer low low low 1 Gb number of SHC members
•  Splunk recommends that you dedicate a host for each role

–  You can enable multiple Splunk server roles on a server with caveats
•  Cluster Master and Deployer are discussed in more detail in the Clustering section
Listen
Listentotoyour
yourdata.
Staging & Testing Environments
• Part of your Splunk deployment should include a separate "sandbox" or
testing / staging environment
–  It should have the same version of Splunk as production
• Sizing the testing environment depends on the nature of your tests

Test Inputs A standalone indexer with minimal performance
and capacity
Test A minimum set of components (one Search Head,
Configurations one Indexer, one Deployment Server, …)
Test An accurate duplication of the production
Performance environment
Best Practice
Listen
Listentotoyour
yourdata.
Authentication / Authorization
• LDAP (Lightweight Directory Access Protocol) / AD (Active Directory) for
authentication and group management
• SSO (Single sign-on) allows a user to log in once and gain access to all
systems without being prompted to log in again
–  Protocols: Kerberos, Security Assertion Markup Language (SAML)
–  Requires a proxy with trusted connection
–  Splunk Professional Services help may be required to install
• In scripted authentication, a user-generated Python script serves as the

middleman between the Splunk server and an external authentication
system such as PAM or RADIUS
Listen
Listentotoyour
yourdata.
Access Control
• Integrate authentication / access control with LDAP and Active Directory
• Map LDAP and AD groups to flexible Splunk capability or data-access roles
–  Define any search as a filter
LDAP / AD Groups Splunk Roles Splunk Capabilities and Filters
Manage Indexes
Manage Users
Share Searches
Save Searches
ERP App Access
PCI Data Access
Listen
Listentotoyour
yourdata.
Security, Privacy, Integrity Measures
• HTTPS transport is available end-to-end
–  Create your own certificates or acquire from a valid 3rd party
–  Distributed search (enabled by default between search head and indexer)
–  Forwarder to indexer over TCP
–  Web browser access to Splunk Web
docs.splunk.com/Documentation/Splunk/latest/Security/
AboutsecuringyourSplunkconfigurationwithSSL
• Indexer Acknowledgement
• Index Splunk’s configurations and logs to track changes
• Event auditing
docs.splunk.com/Documentation/Splunk/latest/Security/AuditSplunkactivity
Listen
Listentotoyour
yourdata.
Security, Privacy, Integrity Measures (cont.)
• Disable Splunk Web when it is not required
–  For example, indexers in a distributed deployment do not require Splunk Web
• Always change the default password

–  This is often overlooked on forwarders
• Hardening Splunk host

–  Follow best practices for hardening your operating system
–  docs.splunk.com/Documentation/Splunk/latest/Security/
WhatyoucansecurewithSplunk
–  benchmarks.cisecurity.org/downloads/
Listen
Listentotoyour
yourdata.
Clustering
Listen
Listentotoyour
yourdata.
Clustering Overview
Splunk offers two different, independent clustering capabilities
• Indexer clustering
–  replicates data (buckets) based on configured policies
–  provides increased data integrity and availability
• Search head clustering

–  replicates knowledge objects and search artifacts across a set of search heads
–  provides increased search accessibility and improved resource scheduling
• Discussed in detail in Splunk Cluster Administration class
74 Listen
Listentotoyour
yourdata.
About Indexer Clusters
• Indexer clusters – A group of Splunk Enterprise nodes that, working in
concert, provide a redundant indexing and searching capability
–  docs.splunk.com/Documentation/Splunk/latest/Indexer/Aboutclusters
• Multisite indexer cluster – An indexer cluster that spans multiple sites, such
as data centers
–  Each site has its own set of peer nodes and search heads
–  Each site also obeys site-specific replication and search factor rules
–  docs.splunk.com/Documentation/Splunk/latest/Indexer/Multisiteclusters
–  The cluster administrator defines the "sites"
ê  can represent a physical location (city, data center, rack) or something else
Listen
Listentotoyour
yourdata.
Single-site Indexer Cluster Overview
•  Cluster Master Index Replication cluster
–  Controls and manages index replication Search Head Cluster Master Distributed
–  There can only be one master search
•  Peer node
–  Indexes data from inputs/forwarders Data replication
–  Replicates data to other peer nodes as Peer Nodes
instructed by the master
•  Search head
–  Works the same as any Splunk search head
–  Required component of indexer cluster
•  Forwarders
Forwarders with autoLB & useACK enabled
–  Send data to peer nodes
Listen
Listentotoyour
yourdata.
Multisite Indexer Cluster
•  Allows for an extra layer of data partitioning
–  Indexers are grouped in "sites"
(defined by the Splunk Admin)
•  Multisite clusters offer two key benefits:
Replicate
–  Disaster recovery HQ (site1) NY (site2)
ê  stores index copies at multiple sites

ê  provides automatic site-failover capability
UK (site3) Cluster JP (site4)
ê  In a disaster, indexing and searching continue Master
on the surviving sites
–  Search affinity
ê  limits searches to assigned site as much as
possible
Multi-site Cluster
ê  greatly reduces WAN network traffic
Listen
Listentotoyour
yourdata.
Cluster Master
• Cluster Master manages an indexer cluster
–  Coordinates the replicating activities of the peer nodes
–  Tells search heads where to find data
–  Manages the configuration of peer nodes
–  Orchestrates remedial activities if a peer becomes unavailable
• There can be only one cluster master, even in a multisite cluster

–  A stand-by cluster master should exist offline for manual cluster master
failover
–  Note that the cluster will continue to operate while the cluster master is offline
Listen
Listentotoyour
yourdata.
Replication and Search Factors
•  Peer nodes copy buckets to each other (replication)
–  The copied buckets may be complete buckets or contain only rawdata
–  The replication factors apply to the entire cluster
–  Replication must be enabled for each index
ê  the default is not to replicate indexes
•  Replication factor (RF)
–  Specifies how many total copies of rawdata the cluster should maintain
–  This sets the total failure tolerance level
•  Search factor (SF)
–  Specifies how many copies will be searchable
ê  searchable buckets have both rawdata and index files
–  Determines how quickly you can recover the search capability
Listen
Listentotoyour
yourdata.
Indexer Clustering Requirements
• A cluster master
–  Plus a "stand-by" cluster master, powered off
• Indexers
–  Single site mode
ê  at least "replication factor" peer nodes in single-site mode
–  that is, for a replication factor (RF) of 3, three peer nodes are required
ê  best practice: at least RF + 1 nodes as a minimum
–  Multisite mode
ê  at least 2 peer nodes per site in multi-site mode
Listen
Listentotoyour
yourdata.
Sizing Guidelines
• Peer Nodes
–  Peer nodes should meet the minimum reference server requirements
–  Replication and search factors affect the peer nodes
ê  CPU and network load will be affected by larger factors
ê  Disk sizing is directly affected by replication and search factors
• Cluster Master
–  No published minimums
–  Must have sufficient CPU and network resources to service requests
Listen
Listentotoyour
yourdata.
Storage Requirements for Clustering
• With index clustering, you must consider the replication factor (RF) and
search factor (SF) to arrive at total storage requirements
–  Total rawdata disk usage = rawdata total * RF
–  Total index disk usage = index data total * SF
• NOTE: This does not include disk space needed to rebuild search factor if
required
–  docs.splunk.com/Documentation/Splunk/latest/Indexer/
Systemrequirements#Storage_considerations
Listen
Listentotoyour
yourdata.
Estimating Disk Usage
• For this example, assume
–  rawdata on disk = ~ 15% of daily index data
–  index files on disk = ~35% of daily index data
• Note that the per day storage must still be multiplied by the # days retention!
Daily Index data = 100GB RF=3 & SF=2 RF=3 & SF=2 RF=3 & SF=3
on 3 peer nodes on 6 peer nodes on 6 peer nodes
rawdata (15 GB daily) 15 * 3 = 45 GB 15 * 3 = 45 GB 15 * 3 = 45 GB
index files (35 GB daily) 35 * 2 = 70 GB 35 * 2 = 70 GB 35 * 3 = 105 GB
Total size across cluster 115 GB 115 GB 150 GB
Per Peer storage per day 115 / 3 = 38.3 GB 115 / 6 = 19 GB 150 / 6 = 25 GB
Listen
Listentotoyour
yourdata.
Peer Failure Impact on Disk Sizing
• When a peer is lost, the 4 peers, replication factor = 3, search factor = 2 Data replication
P Primary (Origin) B Searchable backup R Rawdata-only
cluster master will initiate
additional replication Peer A Peer B Peer C Peer D
across the surviving peers

to return to specified RF 1P 1P 1B 1R
and SF 2P 2B 2R 2P
• This will consume 3P 3R 3P 3B
additional disk space 4P 4P 4B 4R

5P 5B 5R 5P
–  Plan for additional
disk space for
sustained outage
Listen
Listentotoyour
yourdata.
Search Head Clusters
• A search head cluster is a group of Splunk Enterprise search heads that
serves as a resource for searching
• You can run the same searches, view the same dashboards, and access
the same search results from any search head in the cluster
• To achieve this, the search heads in the cluster share
–  Configurations and apps
–  Search artifacts
–  Job scheduling
docs.splunk.com/Documentation/Splunk/latest/DistSearch/AboutSHC
Listen
Listentotoyour
yourdata.
Search Scaling: Do you need a cluster?
• To mitigate performance issues as the search load increases
–  Distribute search load
ê  optimize scheduled searches to run on non-overlapping time slots
ê  use job servers to isolate scheduled searches, real-time searches, and ad-hoc
searches
–  Limit resource usage
ê  limit the time range of end-user searches
ê  configure user roles to limit the number of concurrent real-time searches
–  Increase the number of peer nodes (indexers)
–  Add more search heads
–  Use a search head cluster
Listen
Listentotoyour
yourdata.
Search Head Clustering
• Benefits
–  Simple horizontal scaling Search Head External Load Balancer
Cluster
–  Always-on search services
–  Reliable alerts
–  Consistent knowledge objects across cluster
–  Eliminates the need for Job Server
Deployer
–  No more NFS!
–  Configuration sharing and job artifacts use
local storage via automatic replication
Listen
Listentotoyour
yourdata.
Search Head Clustering Requirements
• Requirements
–  At least 3 search heads
–  A deployer
• Sizing guidelines – Search Heads

–  Search heads should meet the minimum reference server requirements
• Sizing guidelines – Deployer

–  No published minimums
–  Must have sufficient CPU and network resources to service requests and to push
configurations
Listen
Listentotoyour
yourdata.
Notes for Search Head Clustering
• Summary indexes must be forwarded to the indexer tier
–  This is a general best practice, but required for search head clustering
–  This may increase the disk space required on the indexers
• In search head cluster, add more search heads to horizontally scale the
capacity
–  This assumes that the indexer layer has sufficient search capacity
–  If capacity is insufficient, this will create a bottleneck at the indexers/peer nodes
–  Remember that the rule of thumb for scaling is "add another indexer"
Listen
Listentotoyour
yourdata.
Final Notes on Clustering
• Planning is essential
–  What are the requirements for high availability and disaster recovery?
–  How quickly must recovery occur?
–  How many servers can be lost before service degrades or data is lost?
• You can mix clustered and non-clustered servers

–  You can have a mix of clustered and standalone indexers
–  You can have a mix of clustered and standalone search heads
ê  this may be the best solution for some special-purpose or single app search heads
Listen
Listentotoyour
yourdata.
Time: 10-15 minutes
• Task: Size the infrastructure
–  How many indexers, search heads and other components?
• Challenge: Indexer Clustering

–  What if data replication for ecommerce data is desired?
Listen
Listentotoyour
yourdata.
Module 5:
Data Collection
Listen
Listentotoyour
yourdata.
Module Objectives
• Review logging basics
• Understand the difference between Infrastructure
agent and agentless data collection

• Compare data collection methods Integra'on Collec'on
Opera'on
Listen
Listentotoyour
yourdata.
What Splunk Can Index
Splunk is the platform for machine data
Customer-Facing 
Data Outside the  
Data Center
  Manufacturing
  Click-stream data   CDRs & IPDRs
  Shopping cart data   Power consumption
  Online transaction data   RFID data
Logfiles Configs Messages Alerts Metrics Scripts Changes Tickets   GPS data
Virtualization
Linux / Unix And Cloud Applications Databases Networking
Windows
  Registry   Configurations   Hypervisor   Web logs   Configurations   syslog

  Event logs   syslog   Guest OS, Apps   Log4J, JMS, JMX   Audit / query logs   SNMP
  File system   File system   Cloud   .NET events   Tables   Netflow
  sysinternals   ps, iostat, top   Code and scripts   Schemas   IDS
Listen
Listentotoyour
yourdata.
Logging Components
Messages
Firewalls
Splunk Forwarder 
(Relay) Events Indexer(s)
Switches
Linux Machine
Network 
Without Forwarder
Inputs 
https
(Originators)
Events
Network 
Devices
Windows Machine Linux Machine
With Forwarder  With Forwarder 
(Agent) (Agent)
Production
Mainframe Application
Application Events via 

sends events SNMP
Listen
Listentotoyour
yourdata.
Data collection: agentless vs. agent
Agentless Agent
• Can use a variety of transports • Software program (service or
–  TCP, UDP, SNMP, HTTP daemon) that collects information
• Splunk agentless options and pushes it over the network to a
central location
–  HTTP data collection
–  Splunk App for Stream
• Splunk agents
–  Direct monitoring of TCP/UPD –  Universal Forwarder
ports –  Heavy Forwarder
Listen
Listentotoyour
yourdata.
Data Collection Overview
•  File based (monitor) inputs
–  Using a shared file system as a log collector
These alternatives
–  Using Splunk forwarder on the log collector are discussed in
•  Network inputs detail on the
–  Directly connected to single indexer following slides
–  Collected on syslog-ng / rsyslog server with Splunk forwarder
•  Windows events and perfmon (WMI)
–  Local data collection with forwarder
–  Remote data collection
•  Special data collection options
–  Splunk App for Stream
–  HTTP Event Collector
Listen
Listentotoyour
yourdata.
Alternatives for File Inputs – Log Collector
• Share log directory on Production Servers
using a Network File Server of your choice Single
Indexer
(agentless)
• Have Splunk indexer monitor these as local Local Directory  
Monitoring Over
directories over the network

Network
• Does not scale beyond a single indexer Network File Server
Production Servers
Listen
Listentotoyour
yourdata.
Alternatives for File Inputs – Mixed
• Mixed or transitional alternative Indexers
• Forwarder sends data from the "log

repository" to Splunk indexer(s)
Network File Server 

with Forwarder Installed
Production Servers
Listen
Listentotoyour
yourdata.
Alternatives for File Inputs – Forwarding
• Universal Forwarders installed on Indexers
production servers
• Advantages
–  Load balancing and buffering
–  Supports scripted inputs
–  More scalable
–  Better performance Automatic load balancing
configured on all forwarders
–  More secure
Production Servers with Forwarders Installed Best Practice

Listen
Listentotoyour
yourdata.
Alternatives for Network Inputs – Direct to Indexer
• Use network input to send directly to Splunk indexer
–  What happens when Splunk needs to be stopped for maintenance?
–  What happens if you must clean or rebuild an index?
–  What happens if the indexer becomes very busy?
–  What if not running Splunk as root?
• Does not scale beyond a single indexer
• Not recommended
Single
Network Devices Indexer
TCP / UDP
Listen
Listentotoyour
yourdata.
Alternatives for Network Inputs – syslog
• Input to syslog, then Splunk monitors syslog output file
–  Provides a buffering mechanism
• Can take advantage of syslog-ng / rsyslog filtering features
• A forwarder is installed on the syslog-ng / rsyslog server
–  Provides the advantage of load balancing
•  Offers greater resiliency and increases scalability
Forwarder installed on 
Network Devices syslog-ng / rsyslog Indexers
Best Practice
Listen
Listentotoyour
yourdata.
Alternatives for Windows Inputs – WMI Collector
• A Splunk forwarder running in a Windows forwarder performs
remote Event Log / WMI monitoring
• Will not scale horizontally to many production servers
• Can only monitor limited Windows inputs
Windows Heavy  
Production Windows Servers Forwarder
Indexers
WMI
Listen
Listentotoyour
yourdata.
Alternatives for Windows Inputs – Forwarder
• Each Windows production server has a Universal Forwarder installed
–  Collecting performance monitoring locally and forwarding to indexer
–  Scalable
–  More secure and does not require domain privileges
–  Can collect any input, not just Windows remote inputs
Forwarders on Production Windows Servers Indexers
Best Practice
Listen
Listentotoyour
yourdata.
Forwarding from Windows Systems
• Universal forwarder
(local agent)
Universal Remote Collector
–  Offers many types of data sources Source
Forwarder (WMI)
–  Minimizes Windows Event Logs Yes Yes
network overhead (must know name of
event log)
–  Reduces operational risk
Performance Yes Yes
• Remote polling Registry Yes No
(agentless)
–  Use native WMI API to collect
event logs and performance data
Listen
Listentotoyour
yourdata.
Windows Forwarding – Method Comparison
Local Forwarder Remote Collector (WMI)
Performance • Requires less CPU • Requires more CPU
• Performs basic data • More network intensive
compression / encryption • Not recommended for highly audited hosts
• More memory-intensive (such as domain controllers)
Deployment •  Easier if you have control of • Good for limited set of data
base OS build or many data • May be only option without build control or
sources admin rights on target host
•  Can use deployment server
Possible •  UF is limited to 256 kbps of • Starts to poll less frequently over time if
concerns network bandwidth unable to contact a host for a period of time
(to prevent potential DOS attack)
• Not advised for machines that are
frequently offline
Listen
Listentotoyour
yourdata.
HTTP Event Collector
• Component of Splunk Enterprise since 6.3
–  Can be configured for high-volume inputs with multiple collection points
(indexers or heavy forwarders)
http://dev.splunk.com/view/event-collector/SP-CAAAE73
• A token-based HTTP input that is secure and scalable

• Sends events to Splunk without the use of forwarders
–  Can facilitate logging from distributed, multi-modal, and/or legacy environments
–  Log data from a web browser, automation scripts, or mobile apps
• Discussed in Production https Indexer or
Splunk Administration Server Application Heavy Forwarder

Application
sends events
Listen
Listentotoyour
yourdata.
Splunk App for Stream
• Captures real-time streaming wire
data from anywhere in your
datacenter
• Filter and aggregate wire data
easily and automatically into
Splunk
• https://apps.splunk.com/app/1809/
Listen
Listentotoyour
yourdata.
Comparison of Remote Collection Options
Best Universal •  Reliable delivery, encryption, compression
Forwarders •  Distribution and failover over multiple Splunk indexers
(UF) •  Can execute scripts and collect program output remotely
•  Defaults to 256 kbps of network bandwidth (configurable)
HTTP Event •  Secure and scalable
Collector •  Requires developers to code the HTTP post that will be indexed
Good Heavy •  Same as UF except parses data before forwarding & no bandwidth limits
Forwarders •  More load on source system
Not Optimal NFS / SMB •  Not as scalable
•  Possible latency with many files
syslog •  Source info not as fine-grained
•  Use rsyslog / syslog-ng instead for capture buffering, hostname extraction
╸  Makes TZ processing in Splunk better – no need to transform host
•  Use a forwarder to read syslog files and distribute over indexers
Other [Including remote WMI]
Agentless •  Not as scalable
•  May be less secure
Not Recommended FTP, Batch •  Not real-time
scripts •  Can be hard to maintain and manage
Listen
Listentotoyour
yourdata.
Managing Initial Data Collection
•  Production servers often have large volumes of
historical data /var/log 1
Clean directory
–  Can cause license violations when initially indexing

–  May not be worth collecting 2 Move historical data
•  Before starting to index

–  Archive, then delete old data that should 3
Start monitoring /var
original directory
not be indexed 1 /old_log 4 Gradually batch or upload  
historical files to index
–  Place historical data in a separate location

(or "blacklist") 2
•  Index recent and current data 3
•  Batch upload historical data as needed 4
•  Set up a regular schedule for rotating, archiving or
removing older log files after they are indexed Best Practice
Listen
Listentotoyour
yourdata.
Logging Best Practices
• Human-readable
–  Key-value pairs •  For more information
–  Text format –  Splunk Logging Best Practices
http://dev.splunk.com/view/logging-best-practices/
• Traceable SP-CAAADP6
–  Timestamps for every event
–  NIST – Guide to Computer Security Log
–  Use unique identifiers Management
–  Identify the source csrc.nist.gov/publications/nistpubs/800-92/
• Collect events at multiple levels SP800-92.pdf
–  Infrastructure status
–  Application status
–  Semantic events
ê  for example: web clicks, financial trades, cell phone connections and Best Practice
dropped calls, audit trails
Listen
Listentotoyour
yourdata.
Data Collection Best Practices
• Splunk works better with smaller log files
–  Use the size of the log instead of time to roll logs more frequently
• Log rotation
–  Keep the latest two logs in text format before you compress, otherwise you may
be missing delayed events
• Use blacklist to eliminate compressed files from monitor input
–  This prevents duplication of events
• Log to a local file system
Best Practice
Listen
Listentotoyour
yourdata.
Time: 30 minutes
• Create a topology diagram for the environment
Listen
Listentotoyour
yourdata.
Module 6:
Forwarders and Deployment
Management
Listen
Listentotoyour
yourdata.
Module Objectives
• Review forwarder types
• Understand how to manage forwarder Infrastructure
installation in an enterprise
environment
• Understand configuration of all Splunk
components using Opera'on
–  Deployment Server
–  Cluster Master
–  Deployer Querying Comprehension
Listen
Listentotoyour
yourdata.
Types of Forwarders
• Universal forwarder
–  A streamlined binary package that contains only the components
needed to forward data
–  Use the universal forwarder whenever possible
ê  Smaller, more efficient
ê  Network bandwidth defaults to 256 kbps (configurable)
• Heavy Forwarder
–  Uses the Splunk Enterprise binary, with all capabilities
–  Parses data before forwarding, so use a heavy forwarder when
ê  routing events
ê  filtering more than 80% of incoming events
ê  anonymizing or masking data before forwarding to indexer
Listen
Listentotoyour
yourdata.
Forwarder – Scripted Installation
• Use an automated method to install Splunk forwarders
–  Scripted installation
–  Another server management / configuration management tool
• Set the forwarder as a deployment client as part of the installation

• Sample scripts
–  answers.splunk.com/answers/34896/simple-installation-script-for-universal-forwarder
–  answers.splunk.com/answers/100989/forwarder-installation-script
–  docs.splunk.com/Documentation/Splunk/latest/Forwarding/
Remotelydeployanixdfwithastaticconfiguration
Listen
Listentotoyour
yourdata.
Splunk Deployment Tools
Search heads
Search-time
configurations
Deployer Indexers
Indexing &
App parsing
configurations
Cluster Master
Input-time
configurations
Deployment
Server
Forwarders
Listen
Listentotoyour
yourdata.
Deployment Management
Type of Instance Manage Configurations with
Forwarder Deployment Server or other
Stand-alone Search Head configuration management tool
Stand-alone Indexer
Search Head Cluster Member Deployer only
Peer Node in Indexer Cluster Cluster Master only
Listen
Listentotoyour
yourdata.
Deployment Server
• A centralized configuration manager that Deployment 
Server
delivers updated content to deployment clients
Deployment 
–  Units of content are known as deployment apps Apps
–  Do NOT store configuration in Indexers (not in a cluster)
$SPLUNK_HOME/etc/system/local
on clients
ê  system-level configurations on clients cannot be
over-ridden with Deployment Server Forwarders
Listen
Listentotoyour
yourdata.
Deployment Server – Rules of Thumb
• One reference server safely handles approximately 2000+ polls/minute
• Be sensitive to the phoneHomeIntervalInSecs attribute in
deploymentclient.conf
–  Default client poll is 60 seconds
–  Prioritize servers that need more frequent updates Deployment 
Server
–  In general, adjust client polling interval to scale
• For more than 50 deployment clients

–  The deployment server should be on its own server
ê  otherwise, it can share a server with another component
docs.splunk.com/Documentation/Splunk/latest/Updating/Calculatedeploymentserverperformance
Listen
Listentotoyour
yourdata.
Config Management – Deployment Server
• When using Forwarder Management or the Deployment Server (DS)
–  Build an install script/package for clients with only the files needed to contact the
DS (basic installation + deploymentclient.conf)
–  Clients will get the rest of the configuration information from the DS
• Use in combination with another configuration management (CM) tool for

–  Authentication certificates, passwords, log-local.cfg,
deploymentclient.conf, upgrades of forwarder software, remote boot-start
of production system
–  Use other CM tools for rare tasks, deployment server for routine tasks
Listen
Listentotoyour
yourdata.
Deploy Apps to Search Head Cluster
• Use the Deployer to distribute apps to SHC members
–  DO NOT stage any out-of-the-box apps in deployer (search, launcher, etc.)
–  All members of the cluster will receive the same apps
• You cannot use the Deployment Server to distribute apps to SHC members
• Deployer supports both push and pull mechanisms
–  Can push apps to search head cluster members
–  Can be polled by new or restarted search head cluster members for updates
• Knowledge Objects that are created by users in Splunk Web

–  are automatically replicated
–  are not sent to the Deployer
Listen
Listentotoyour
yourdata.
Deploy Apps to Indexer Cluster
• Use the Cluster Master to distribute apps to the peer nodes
–  DO NOT stage any out-of-the-box apps in Cluster Master (search, launcher, etc.)
–  All members of the cluster will receive the same apps
–  Cluster Master pushes apps and rolling-restarts peer nodes if needed
–  Cluster master repository: /etc/master-apps
–  Destination directory on indexers: /etc/slave-apps
• You cannot use the Deployment Server to distribute apps to the peer nodes
Listen
Listentotoyour
yourdata.
Deployment Apps – Best Practices
• Design your deployment app
–  An app is a set of deployment content (a configuration bundle)
–  An app is deployed as a unit and should be small
–  Take advantage of Splunk’s configuration layering
–  Use a naming convention for the apps
–  Create classes of apps, for example
ê  input apps
ê  index apps
ê  web control apps
• Carefully design apps regardless of your configuration Best Practice
management tool (DS, cluster master, Puppet, etc.)
Listen
Listentotoyour
yourdata.
Time: 15 minutes
• Task: Analyze an app
–  Remember that some Splunkbase apps may need to be repackaged into
deployment apps for your environment
• Task: Design a test environment
–  How many servers are needed for testing?
• How will you test the deployment, including the apps?

• Challenge
–  Update the topology to reflect Phase 2 of the deployment
Listen
Listentotoyour
yourdata.
Module 7:
Data Comprehension
Listen
Listentotoyour
yourdata.
Module Objectives
• Identify the 6 things that you must get
right at index time Infrastructure
• Review knowledge objects

• Review the Common Information Integra'on Collec'on
Model (CIM) and data normalization
Opera'on
• Discuss data model design
Listen
Listentotoyour
yourdata.
Laying the Foundation During Index Time
Six things you must get correct at index time
Input Phase Parsing Phase => Index Time
Date /
timestamp no Yes
extraction
Event
no Yes
boundary
host
sourcetype Yes, optimally set
Yes, can override input setting
source during input
index
This metadata cannot be changed after the events are written to the index
docs.splunk.com/Documentation/Splunk/latest/Admin/Configurationparametersandthedatapipeline
Listen
Listentotoyour
yourdata.
Data Comprehension Tasks
• Oversight of knowledge object creation and usage across teams,
departments, and deployments
• Normalization of event data
• Creation of data models
• Management of shared knowledge objects
• Development of naming conventions
Listen
Listentotoyour
yourdata.
Data Enrichment – Splunk Knowledge
Data interpretation Fields and field •  The first order of Splunk Enterprise knowledge
extractions •  Automatic field extraction brings meaning to
your raw data
•  Fields that are extracted manually expand and
improve upon this layer of meaning
Data classification Event types and •  Event types group sets of events that have
transactions common characteristics
•  Transactions are collections of conceptually-
related events
Data enrichment Lookups and •  Enrich your data with external resources
workflow actions •  Add fields to data from external data sources
•  Workflow actions enable interactions with other
applications or web resources
Listen
Listentotoyour
yourdata.
Data Enrichment – Splunk Knowledge (cont.)
Data Tags and aliases •  Normalize your data
normalization •  Group a set of related field values together
•  Apply aliases to fields with the same meaning,
but different names
•  Give extracted fields tags that reflect different
aspects of their identity
Data models Data models •  Representations of one or more datasets
•  Drive Pivot tool
•  Designed by knowledge managers who fully
understand the format and semantics of their
indexed data
•  Can be used with other knowledge objects
including lookups, transactions, search-time field
extractions, and calculated fields
Listen
Listentotoyour
yourdata.
Lookups
• Four built-in types
–  File-based uses a csv file, stored in the lookups directory
ê  preferred for small tables that change infrequently
–  KV Store requires collections.conf that defines fields
ê  uses mongodb to store the data
ê  can be used to store state information and individual entries can be updated
ê  preferred for large tables or data that needs to be regularly updated
–  External uses a python script or a executable in the bin directory
–  Geospatial uses a kmz saved in the lookups directory to support the choropleth
visualization
• Splunk DBConnect
–  Can lookup data in a relational database
Listen
Listentotoyour
yourdata.
Data Enrichment – Splunk DB Connect
MS SQL Other
Oracle DB Server Databases
• Provides a way to enrich and combine event data

with database data
–  Easily configure database queries using the Splunk
user interface JDBC
–  Create lookups based on database query results

–  Import and index data already stored in your
Java Bridge Server
database Database Connection  Database
docs.splunk.com/Documentation/DBX/latest/ Lookup Pooling Query
DeployDBX/AboutSplunkDBConnect
Listen
Listentotoyour
yourdata.
Normalizing Data
• Events from different products and vendors have unique formats and fields
–  Even when the events are semantically equivalent
• Most traditional approaches to providing a unified view are based on

normalizing the data into a common schema at input time
• Splunk maps data to a common set of field names and tags at search time
–  These can be defined at any time after the data has been captured, indexed, and
available for ad hoc search
–  This improves flexibility by normalizing with late-binding searches instead of index-
time parsing
Listen
Listentotoyour
yourdata.
Data Normalization – Example
This example illustrates normalized field names across two separate data
sources
Data Source 1
Normalized Field Names
Data Source 2
Listen
Listentotoyour
yourdata.
Common Information Model
• The Splunk Common Information Model (CIM)
–  Defines relationships in the underlying data using tags and fields
–  Provides a standard method of parsing, categorizing, and normalizing event data
–  Leaves the raw machine data intact
–  Defines the least common denominator for a domain of interest
• Applications that use the CIM will integrate more easily

• Splunk CIM is implemented as a free add-on
–  Includes data models
–  Has domain coverage for security, inventory, performance, networking and more
–  docs.splunk.com/Documentation/CIM/latest/User/Overview
Listen
Listentotoyour
yourdata.
CIM Version For Apps
• Apps that use the CIM will indicate

it on the sidebar
–  Check the version to ensure that
apps will integrate easily
Listen
Listentotoyour
yourdata.
Data Models
A data model provides a hierarchical mapping of semantic knowledge about
one or more datasets
Pivot Data
Index sourcetype
Editor Model host
source
customer customer fields
order order Index tags
shipment shipment eventtypes
transaction transaction etc.
End user etc. etc.
Index
docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels
Listen
Listentotoyour
yourdata.
Data Model Design
• Good data model design requires
–  An understanding of the data indexed in Splunk
ê  indexes, fields, sourcetypes, etc.
ê  understanding of the data content
–  Appropriate fields and knowledge objects to enrich the data
–  An understanding of the user needs
ê  what are the questions that the user wants to answer?
ê  what are the parameters of the model – from the user or business point of view?
• Simply exposing the existing sourcetypes and fields to the user is generally
not sufficient
Listen
Listentotoyour
yourdata.
Time: 15 minutes
• Task: Identify sourcetypes
–  Which inputs have pre-trained sourcetypes?
–  Which inputs are defined by apps?
• Challenge
–  Examine fields in the CIM for network traffic
Listen
Listentotoyour
yourdata.
Module 8:
Search Considerations
Listen
Listentotoyour
yourdata.
Module Objectives
• Understand MapReduce
• Discuss search performance Infrastructure
• Understand how scheduled reports are

dispatched Integra'on Collec'on
• Discuss differences between data Opera'on

summary methods
Listen
Listentotoyour
yourdata.
Components – MapReduce
• A framework for processing parallelizable
1
problems across huge datasets using a
large number of computers (nodes) in a 2
cluster
en.wikipedia.org/wiki/MapReduce 3
• Splunk search uses this technology

component
…
n
Source: Wikipedia
Listen
Listentotoyour
yourdata.
Performance Considerations for MapReduce
• Which searches will MapReduce well? Which do not?
–  This is determined by how much of the work can be done on the indexers vs.
how much manipulation of the search results must be done by the search head
Event searches and Reporting searches
non-distributable commands
Search Head Search Head
•  Indexers retrieve events in •  Indexers can perform
parallel event retrieval and
•  Efficient for event searches additional steps in
•  For other searches parallel
-  Additional steps, if •  Intermediate results
Intermediate
Events Events needed, must be done Results are returned
on the search head •  The search head
-  CPU & memory combines and
bottlenecks can occur presents the results
on the search head
Indexer Indexer Indexer Indexer
Listen
Listentotoyour
yourdata.
Questions for Search Types and Volume
What types of searches •  Free-text search
do you expect? •  Search by fields
•  Complex correlations and transactional searches
What types of alerts do •  Alerts and notifications of specific events
you expect? •  Alerts and notifications of thresholds reached or changed
•  Alerts and notifications of complex thresholds
Reporting •  How frequently will the reports run?
•  How current must the report data be? Near real time, to an hour, to
the previous day?
•  Will historical or retroactive reports be required, or only current
reports?
•  How will the reports be delivered to users? Via email, web interface,
or other method?
•  Are there users who do not have access to the raw data who must
see the reports?
Listen
Listentotoyour
yourdata.
Raw Event Searches
• These search queries contain only the search command
–  The results are usually a list of raw events
–  Usually categorized as super-sparse or rare type searches
• A typical use case would be the deep analysis

Search Head
of a specific problem or incident

• Examples include:
–  Checking error codes Events Events
–  Correlating events
–  Investigating security issues
Indexer Indexer
–  Analyzing failures
Listen
Listentotoyour
yourdata.
Report Generating Searches
• These search queries consist of initial search followed by some statistical
calculation
Search Head
–  The result is usually a table, graph or a single value
–  Often categorized as dense or sparse type searches
• They always require fields and at least one Intermediate

statistical command Results
• Examples include:
–  Compiling a daily count of error events Indexer Indexer
–  Counting the number of times a specific user has logged in

–  Calculating the 95th percentile of field values
Listen
Listentotoyour
yourdata.
Search Performance – Types of Searches
Type of search Description Indexer throughput Impact
A large percentage of the data Up to 50K matching
Dense CPU bound
matches the search events per second
A small percentage of data Up to 5K matching
Sparse CPU bound
matches the search events per second
A "needle in a haystack" search
Indexer must check all buckets to
Up to 2 seconds per Primarily I/O
Super-sparse find results
bucket bound
Time consuming for large numbers
of buckets
Similar to super-sparse searches,
10 – 15 buckets per Primarily I/O
Rare but bloom filters eliminate buckets
second bound
that don't include search results
Listen
Listentotoyour
yourdata.
Improving Search Performance
• Best practice
–  Make sure disk I/O is as good as you can get
–  Increase CPU hardware only if needed
• Most search performance issues can be addressed by adding additional

search peers (indexers)
• Look at resource consumption on both the indexer tier and search head tier
to diagnose slow searches
Listen
Listentotoyour
yourdata.
Improving Performance with limits.conf
• If you have unused CPU/memory resources,
you can set multiple search pipelines
• Indexed real-time significantly limits.conf on indexer
improves performance if there [search]
are many real-time searches batch_search_max_pipeline = 2
–  Normally, real-time searches [realtime]

read the indexing queue indexed_realtime_use_by_default = true
indexed_realtime_disk_sync_delay = 60
–  Indexed real-time causes the indexed_realtime_default_span = 1
indexer to read the indexes on indexed_realtime_maximum_span = 0
disk and collect events as
they arrive in the hot buckets
Listen
Listentotoyour
yourdata.
Knowledge Bundles
• When initiating a distributed search, the search Search Head
head replicates its knowledge objects (KOs) in the

form of a knowledge bundle to its search peers
–  Therefore, indexers may receive nearly the entire
Knowledge 
contents of all the search head's apps Bundles
• If an app contains large files that do not need to be

shared with the indexers, use the
[replicationSettings:refineConf] stanza in
Indexer Indexer
distsearch.conf to reduce the size of the bundle
• But be careful not to eliminate needed KOs!
http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Limittheknowledgebundlesize
Listen
Listentotoyour
yourdata.
Splunk Report Scheduler Behavior
• Splunk runs as many concurrent jobs as it can, based on limits.conf
–  On a 16 CPU search head, the default is 11 concurrent scheduled searches
• In the example, we use 4 concurrent scheduled searches

–  Jobs S1, S2, S3, and S4 take the first 4 slots
–  Job S5 must wait until another job finishes (S4)
Time
Search Slots = 4
S1 ✔ ✔ Run
S2 ✔ X Deferred
S3 ✔
S4 ✔
S5 xxxx ✔ ✔ ✔ ✔
Listen
Listentotoyour
yourdata.
How a scheduled job gets skipped
Window 1 Window 2
Search Slots = 4 Time
S1 ✔ ✔
S2 ✔ ✔
S3 ✔ ✔ ✔ Run
S4 ✔ ✔ x Deferred
S5 xxxx ✔ ✔ ✔ ✔ xxxxxX ✔ X Skipped
• Every scheduled job has a window – a time period where it should run
• A job is deferred if it can’t start when scheduled
–  A deferred job is retried (repeatedly for the duration of its window)
–  Example: Job S5 is deferred 4 times, but then runs during Window 1
• A job is skipped if it cannot start in its window

–  Example: Job S5 never runs during Window 2, and so is skipped
Listen
Listentotoyour
yourdata.
Report Scheduler Priorities
• The scheduler calculates priorities so that a skipped job will have a higher
priority in its next scheduled window
• To identify if Splunk has skipped searches, run the following
index=_internal source=*scheduler.log status!=success
| stats count by user app savedsearch_name status
https://wiki.splunk.com/Community:TroubleshootingSearchQuotas
Priority = Next_runtime
+ Avg_runtime x priority_runtime_factor
- skipped_count x period x priority_skipped_factor
+ window_adjustment
Listen
Listentotoyour
yourdata.
Summarization and Time Value of Data
• The value of data can be broken into two curves
–  Value of and individual data item
–  Aggregate data value
• In most cases, the value of an individual data item will decline over time and
the aggregate data value will increase in value
Value of Individual Aggregate Data

Data Item Value
Data Value
Age of Data
Time
Listen
Listentotoyour
yourdata.
Acceleration
Report •  Accelerates individual reports
Acceleration •  Uses automatically-created summaries to speed completion times for qualified reports
•  Easier to create than summary indexes and backfills automatically
•  Data is stored on the indexers, alongside the related buckets
•  Depending on the defined time span, periodically ages out data
•  Always ages data when the corresponding bucket ages out
•  Cannot create a data cube and report on smaller subsets
Summary •  Accelerates reports that don't qualify for report acceleration

Indexing •  Uses manually created summary indexes that exist separate from main indexes
•  Data is stored on the search head, but can be forwarded to the indexer tier
•  Can persist after events have been frozen by controlling retention period or index size
•  Backfill is a manual (scripted) process
Data Model •  Accelerates all of the fields defined in a data model

Acceleration •  Uses automatically-created summaries to speed completion times for pivots
•  Takes the form of time-series index (.tsidx) files, which are created on the indexer
Listen
Listentotoyour
yourdata.
How Acceleration Works
• Report acceleration and summary indexing speed up individual searches,
on a report-by-report basis
–  This is accomplished by building collections of pre-computed search result
aggregates
• Data model acceleration speeds up reporting for the specific set of attributes
(fields) that you define in a data model
–  Creates summaries for the specific set of fields, accelerating the dataset
represented by that collection of fields rather than a particular search
Listen
Listentotoyour
yourdata.
Data Model Acceleration
• Used to speed up retrieval of the events that underlie a data model
• Speeds up reporting for the entire set of attributes (fields)
• Updated every five minutes
• Affects only event object hierarchies
–  Hierarchies based on search or transaction objects cannot be accelerated
• Most efficient if the root event objects include the index(es) in their
initial constraint search
Listen
Listentotoyour
yourdata.
Data Model Acceleration
• The High Performance Analytics Store
–  Consists of time-series index files, with the .tsidx file extension
–  Exists on the indexer tier, parallel to the buckets that contain the events
referenced in the .tsidx files
–  Spans a "summary range," which is the time range selected for acceleration
• Location and size can be set in indexes.conf

–  default location is
$SPLUNK_HOME/var/lib/splunk/indexName/data_model_summary
Listen
Listentotoyour
yourdata.
Disk Usage: Data Model Acceleration
• Estimated storage for data model acceleration per year =
Data volume per day * 3.4
–  Example: if indexing 250 Gb / day, the expected storage needed for acceleration:
250 * 3.4 = 850 GB across all indexers
• To see the size on disk of a particular data model acceleration,
go to the Data Models page and click Acceleration
Listen
Listentotoyour
yourdata.
Report Acceleration
• Report Acceleration
–  Report results are "pre-computed" at regular intervals and stored on the
indexer tier
–  All reports that use a similar set of data and computations automatically use
the same report acceleration summaries if they can
• Report Acceleration Summaries
–  Span a time range selected for acceleration
–  Update every 10 minutes by default
–  Are automatically rebuilt if the data in the underlying bucket changes
docs.splunk.com/Documentation/Splunk/latest/Knowledge/
Manageacceleratedsearchsummaries
Listen
Listentotoyour
yourdata.
Report Acceleration
• Requirements for acceleration
–  The report was not created via Pivot
–  The underlying search qualifies for acceleration
• Splunk typically won't generate a summary if:

–  There are fewer than 100K events in the summary range
ê  Faster to execute the search without a summary
–  Summary size is projected to be too large
ê  Faster to execute the search using the normal index because it is smaller
• If a summary is defined but not created for the above reasons
–  Splunk continues to check periodically
–  automatically creates a summary when/if the search meets the requirements
Listen
Listentotoyour
yourdata.
Disk Sizing: Report Acceleration
• Report Acceleration Summaries
–  Exist on the indexer tier, in parallel to the buckets that contain the events
• Location and maximum size can be set in indexes.conf

–  Default location is
$SPLUNK_HOME/var/lib/splunk/indexName/summary
–  Location is specified with summaryHomePath
• Can be examined, managed and deleted from Settings > Report

Acceleration Summaries
Listen
Listentotoyour
yourdata.
Summary indexing overview
• Efficiently report on large volumes of data
• Spread the cost of a computationally expensive report over time
–  Run reports over long time ranges for large datasets more efficiently
ê  Example: Show the number of page views for each of your web sites over the past
quarter, broken out by site
• Retain summarized data after original data sources have been frozen
–  Example: If you can only maintain original firewall logs for 30 days, but want
reports over a year
Listen
Listentotoyour
yourdata.
How summary indexing works
• Data is collected and summarized
–  A populating search is scheduled to run periodically
–  The populating search stores its results in a summary index
• Reports are run against the summary index

–  Searches run faster over the smaller summary index
–  The summary index may be used to report when data ages out of the original
index
Listen
Listentotoyour
yourdata.
Summary indexing flow
purchasedProducts search Populating Search
sourcetype=access_combined | sistats count by product_name
Run every hour, time range 1 hour
Populating
search results
are formatted
and stored as
index statistics
Summary
index
Listen
Listentotoyour
yourdata.
Summary Indexing
• Requires manual management and knowledgeable monitoring
–  Manual rebuild of summary statistics needed
ê  If populating search fails to run on schedule
ê  If data arrives late
ê  If knowledge objects change
• Summary index configuration
–  Must be defined in indexes.conf, just like a regular index
–  Retention and size are controlled in indexes.conf
–  Do not consume license (summary indexing is free)
–  Summary indexes are usually small, but allocate space on the indexer tier
ê  monitor the space used like any other index
Listen
Listentotoyour
yourdata.
Forwarding Summary Indexes
• Summary indexes are created by default on the search head
• If you use summary indexes, forward your summary indexes to the
indexing layer
–  This allows all members to access them
ê  otherwise, they're only available on the search head that generates them
–  This is required for search head clustering
• Configure the search head as a forwarder

http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Forwardsearchheaddata
Best Practice
Listen
Listentotoyour
yourdata.
Time: 10 minutes
• Review the list of searches. Could any searches benefit from acceleration?
• What additional disk space considerations have you uncovered?
Listen
Listentotoyour
yourdata.
Module 9:
Integration
Listen
Listentotoyour
yourdata.
Module Objectives
• Describe integration methods
• Identify common integration points Infrastructure
Opera'on
Listen
Listentotoyour
yourdata.
Types of Integration
• Accessing other data or applications from within Splunk
• Accessing Splunk from another application
• Re-forwarding data to other applications after it is indexed in Splunk
• Integration with Hadoop – Hunk
Note that apps may also provide convenient integration points
Listen
Listentotoyour
yourdata.
Search Extensibility Review
• Custom search commands
–  Write a new search commands in Python 2.7 Extend & Integrate Splunk Build Splunk Apps
–  http://dev.splunk.com/view/python-sdk/SP- SDKs
Web
Data Models Framework
CAAAEU2 Java
JavaScript
Python
Search
Simple XML
Extensibility
Ruby
• Scripted lookups
JavaScript
C# Modular
PHP Inputs
–  programmatically script lookups in Python REST API
• Workflow actions and custom navigation

–  access additional resources outside Splunk with a
single click
Listen
Listentotoyour
yourdata.
Integration Using Alerts
• Custom alert actions
–  allow developers to extend Splunk to build, package
and publish new alert actions for Splunk
–  Some modular alerts have been published on
Splunkbase (for example, Twilio SMS Alerting)
–  http://docs.splunk.com/Documentation/Splunk/latest/
Alert/CreateCustomAlerts
–  docs.splunk.com/Documentation/Splunk/latest/
AdvancedDev/ModAlertsIntro
Listen
Listentotoyour
yourdata.
Splunk REST API
• REST stands for
REpresentational State Transfer
Extend & Integrate Splunk Build Splunk Apps
• The most basic way to programmatically Web
access Splunk is to use the REST API via SDKs

Java
JavaScript
Data Models
Search
Simple XML Framework
HTTP requests
Python JavaScript
Extensibility
Ruby
C# Modular
PHP Inputs
• With over 200 endpoints, the REST API lets REST API
you interact directly with a Splunk instance

• For more information
http://dev.splunk.com/restapi
Listen
Listentotoyour
yourdata.
Software Development Kits (SDKs)
• SDKs provide language-specific libraries for
accessing the Splunk REST API
Extend & Integrate Splunk Build Splunk Apps
–  Includes documentation, code samples,
resources, and tools SDKs
Java
Data Models
Web
Framework
JavaScript Simple XML
–  Simplifies code development for

Search
Python
Extensibility
Ruby JavaScript
C# Modular
ê  Python PHP Inputs
ê  JavaScript REST API
ê  Java
ê  PHP
ê  Ruby
ê  C#
http://dev.splunk.com/sdkss
Listen
Listentotoyour
yourdata.
Sending Data to Other Systems
Search Head
• Forward all or a subset of data via TCP with Alert
to other systems Service Desk

–  Forward either raw text or syslog
–  Can be done centrally via the indexer Event Console
–  Does not increase license Indexer
• Use scheduled searches to insert SIEM
events into other systems

–  Leverages alert functionality
–  For example, use correlation searches
and configure the alerts to open tickets
or insert events into a SIEM Forwarders
Listen
Listentotoyour
yourdata.
Hunk – Splunk for Hadoop
• Hunk provides access to data stored in a
Hadoop Distributed File System (HDFS)
• Provides the familiar Splunk user interface
• Manages the MapReduce access to
HDFS
• Accesses both Splunk indexes and HDFS
from a single Splunk search head
–  Requires both a Hunk and a Splunk
Enterprise license
–  Available since Splunk 6.3
Listen
Listentotoyour
yourdata.
Module 10:
Operations and Management
Listen
Listentotoyour
yourdata.
Module Objectives
• Identify ongoing tasks in a Splunk deployment
• Define monitoring tools Infrastructure
• Identify backup and archiving methods

• Discuss onboarding processes Integra'on Collec'on
Opera'on
Listen
Listentotoyour
yourdata.
Operations and Management
• Operations
–  Monitoring
–  Backups and archiving
• Deployment and Distributed Management

• Change Control
–  Onboarding new data sources
–  Onboarding new users
–  Operations review
Listen
Listentotoyour
yourdata.
Distributed Management Console (DMC)
• Splunk environment monitor and
diagnostic tool
–  Only admins can access
–  Investigate server performance
and resource usage
–  Uses the _introspection index
populated by
introspection_generator_addon
–  Provides configurable alerts
Listen
Listentotoyour
yourdata.
Monitoring Splunk: S.o.S app
• The Splunk on Splunk (S.o.S) app
is another tool for monitoring or
troubleshooting
–  View, search and compare configs
–  Detect errors and inspect crash logs
–  Measure indexing performance & event
processing
–  View details of scheduler and user activity
–  Gradually being replaced by DMC
• SOS does use Splunk license

• Prefer DMC when possible
Listen
Listentotoyour
yourdata.
Operations – Backups and Archiving
• Backups and archiving are discussed in Splunk Administration and in
docs.splunk.com/Documentation/Splunk/latest/Indexer/Backupindexeddata
• Choose a data backup strategy based on your tolerance for data loss &
recovery effort
–  Incremental backup
–  Full backup
–  Indexer replication
• Remember to back up configuration files across the environment, including

–  Search heads
–  Indexers and cluster master
–  Deployer, deployment server, license master and any other Splunk instances
Listen
Listentotoyour
yourdata.
Archiving Data to Hadoop
• If you have both a Hunk license and a Splunk Enterprise license, you can
use Hadoop to archive buckets
–  When buckets move from cold to frozen, the raw data will be moved into Hadoop
http://docs.splunk.com/Documentation/Hunk/latest/Hunk/Setanarchivescript
• Once in Hadoop, the data can still be searched using Hunk
Listen
Listentotoyour
yourdata.
Onboarding New Data Inputs
• Use the Deployment Methodology as a guide
• What process does a user follow to index new data into Splunk?
• Once data is indexed, what search time changes need to be made?
Are there new knowledge objects that should be created?
• Who determines the ownership of data?
• How are reports created to support the new data?
Listen
Listentotoyour
yourdata.
Onboarding New Users
• How is a user or user group given access to Splunk?
• What are the default apps?
• How is access to data controlled? What data do new users see?
• What training do new users need or receive?
• Are there limits on making objects (especially scheduling searches)?
Any quotas?
• Progressive privileges
–  As users are trained or gain advanced skills, give them additional privileges
ê  real time search, create dashboards, etc.
Listen
Listentotoyour
yourdata.
Operations Review
• Periodically review scheduled searches
–  Monitor jobs & review search logs to identify the expensive searches
–  Check for inefficiencies and redundancies
• Periodically review acceleration summaries

–  Consider removing summaries with low usage and large size or heavy load
• Review disk consumption

• Perform a security audit on a regular basis
Listen
Listentotoyour
yourdata.
Operations – Security
• Check Splunk Security Portal frequently
–  www.splunk.com/page/securityportal
• Important resource is the Securing Splunk manual

–  docs.splunk.com/Documentation/Splunk/latest/Security/
WhatyoucansecurewithSplunk
Listen
Listentotoyour
yourdata.
Time: 5 minutes
• Email your lab document(s) to your instructor
Listen
Listentotoyour
yourdata.
Resources and Additional Study
• Splunk documentation for architects
–  Installation Manual (for hardware and capacity planning information)
–  Distributed Deployment Manual
–  Capacity Planning Manual
–  Other manuals for specific topics
• Splunk Answers answers.splunk.com
• Splunk's public wiki (watch dates for outdated topics)
www.splunk.com/wiki/Deploy:Deployment_topics
–  Deployment scenarios and other useful topics
Listen
Listentotoyour
yourdata.
Resources and Additional Study
• Splunk documentation for architects
–  Installation Manual (for hardware and capacity planning information)
–  Distributed Deployment Manual
–  Capacity Planning Manual
–  Other manuals for specific topics
• Splunk Answers answers.splunk.com
• Splunk's public wiki (watch dates for outdated topics)
www.splunk.com/wiki/Deploy:Deployment_topics
–  Deployment scenarios, performance info and other useful topics
Listen
Listentotoyour
yourdata.
Support Options
• Community Support - Documentation, the Community Wiki,
Splunk Answers, Splunk Blogs, and IRC are available to anyone for
answers to basic technical questions. These resources are always available
to provide you with an immediate answer
• Enterprise Support - Direct access to our Customer Support team by
phone and the ability to manage your cases online
• Global Support - 24x7x365 support for critical issues, a dedicated
resource to manage your account and quarterly review of your
deployments
Listen
Listentotoyour
yourdata.
.conf2016: The 7th Annual
Splunk Worldwide Users’ Conference
•  September 26-29, 2016 •  80+ Customer Speakers

•  The Disney Swan and Dolphin, Orlando •  40+ Apps in Splunk Apps Showcase
•  4400+ IT & Business Professionals •  70+ Technology Partners

•  3 days of technical content •  1:1 networking: Ask The Experts and
•  175+ sessions Security Experts, Birds of a Feather and

Chalk Talks
•  3 days of Splunk University
•  Sept 24-26, 2016 •  NEW hands-on labs!
•  Get Splunk Cer'ﬁed! •  Expanded show ﬂoor, Dashboards Control
•  Get CPE credits for CISSP, CAP, SSCP
Room & Clinic, and MORE!
Listen
Visit
Listentotoyour
yourdata.
conf.splunk.com
Generated
data. 195
for
for Igor Abreu (iabreu@realprotect.net) (C) more information
Splunk Inc, not for distribution
Copyright © 2015 Splunk, Inc. All rights reserved | 13 May 2016
Thank You
Complete the Class Evaluation to be in this month's drawing for a $100 Splunk
Store voucher
1.  Look for the invitation email, What did you think of your Splunk Education class, in your inbox
2.  Click the link or go to the specified URL in the email
Listen
Listentotoyour
yourdata.
Appendix:
Sample Topologies
Listen
Listentotoyour
yourdata.
Deployment – Centralized Topology
Indexers
Search Head Cluster
Auto-load
Balancing 
(in outputs.conf on forwarders)
Intermediate
Forwarders Forwarder
Syslog Devices
Listen
Listentotoyour
yourdata.
Deployment – Decentralized Topology
Indexers Indexers Search Head
Forwarders
Forwarders
Indexers Intermediate
Forwarder
Search Head Indexers
Search Head Cluster
Forwarder
Network Devices
Listen
Listentotoyour
yourdata.
Deployment – Hybrid Topology
Indexers
Search Head Indexers
Forwarders
Forwarders
Indexers Intermediate
Forwarder
Search Head Indexers Search Head Cluster
Forwarder
Indexers Network Devices
Listen
Listentotoyour
yourdata.

Architecting and Deploying Splunk 6.4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Architecting and Deploying Splunk 6.4

Uploaded by

Copyright:

Available Formats

Architecting and Deploying Splunk

1. Infrastructure - ensure Splunk has the proper resources

3. Comprehension - tell Splunk about the data

6. Operation - keep it operating, grow it, evolve it

• What other use cases may apply?

• Obtain a network diagram

• What tools are in use?

• Understand how to estimate storage

• Identify relevant apps Opera'on

– Document impact on inputs and

• Once data enters Splunk, it is

hot_v1_101 apple POSTING

• Searches will take much longer over reduced buckets

• To test specific types of data

Input Daily Retention Visibility Estimated Index

• While apps can always be added later, plan up

• Challenge: Identify potential apps

• Understand reference servers

– 1 Gb is the minimum bandwidth

• A solid Domain Name Service (DNS) Standard

• Turn off Transparent Huge Pages (THP)

Amount of incoming data

The heavy forwarder and universal forwarder

Index Time Search Time

• One reference indexer per 250 GB / day

– Use distributed search

• You can configure multiple pipeline sets to increase hardware utilization

Single pipeline set

Device Type IOPS Interface

• Virtualization tech brief contains details

peers and then…

presents them to the user

– 1 active job per core

• Search heads mostly aggregate results

fewer concurrent jobs => OS Linux or Windows - 64-bit version

better search throughput

• Apps may increase the search load

• Plan for expansion as adoption grows

– Splunk Enterprise's monitoring tool Admin

– Provides detailed topology and

– For sizing: follow the reference server Deployment

guidelines for search heads Server

– Should not be a production search head

– Can run on a shared instance with caveats

– Only admins should access

*CPU & memory can spike

• Splunk recommends that you dedicate a host for each role

• Sizing the testing environment depends on the nature of your tests

• In scripted authentication, a user-generated Python script serves as the

LDAP / AD Groups Splunk Roles Splunk Capabilities and Filters

• Always change the default password

• Hardening Splunk host

• Search head clustering

• Discussed in detail in Splunk Cluster Administration class

ê stores index copies at multiple sites

• There can be only one cluster master, even in a multisite cluster

across the surviving peers

• This will consume 3P 3R 3P 3B

additional disk space 4P 4P 4B 4R

• Sizing guidelines – Search Heads

• Sizing guidelines – Deployer

• You can mix clustered and non-clustered servers

• Challenge: Indexer Clustering

agent and agentless data collection

1.  Infrastructure - ensure Splunk has the proper resources

3.  Comprehension - tell Splunk about the data

6.  Operation - keep it operating, grow it, evolve it

• What other use cases may apply?

• Obtain a network diagram

• What tools are in use?

• Understand how to estimate storage

• Identify relevant apps Opera'on

–  Document impact on inputs and

• Once data enters Splunk, it is

• Searches will take much longer over reduced buckets

• To test specific types of data

• While apps can always be added later, plan up

• Challenge: Identify potential apps

• Understand reference servers

–  1 Gb is the minimum bandwidth

• A solid Domain Name Service (DNS) Standard  

• Turn off Transparent Huge Pages (THP)

  Amount of incoming data

•  One reference indexer per 250 GB / day

–  Use distributed search

• You can configure multiple pipeline sets to increase hardware utilization

• Virtualization tech brief contains details

–  1 active job per core

• Search heads mostly aggregate results

• Apps may increase the search load

• Plan for expansion as adoption grows

–  Splunk Enterprise's monitoring tool Admin

–  Provides detailed topology and

–  For sizing: follow the reference server Deployment 

–  Should not be a production search head

–  Can run on a shared instance with caveats

–  Only admins should access

•  Splunk recommends that you dedicate a host for each role

• Sizing the testing environment depends on the nature of your tests

• In scripted authentication, a user-generated Python script serves as the

• Always change the default password

• Hardening Splunk host

• Search head clustering

• Discussed in detail in Splunk Cluster Administration class

ê  stores index copies at multiple sites

• There can be only one cluster master, even in a multisite cluster

• This will consume 3P 3R 3P 3B

• Sizing guidelines – Search Heads

• Sizing guidelines – Deployer

• You can mix clustered and non-clustered servers

• Challenge: Indexer Clustering

  Registry   Configurations   Hypervisor   Web logs   Configurations   syslog

Application Events via 

• Does not scale beyond a single indexer Network File Server

• Mixed or transitional alternative Indexers

• Forwarder sends data from the "log

Network File Server 

• A token-based HTTP input that is secure and scalable

• Discussed in Production https Indexer or

–  Can cause license violations when initially indexing

•  Before starting to index

–  Place historical data in a separate location

• Log to a local file system

• Set the forwarder as a deployment client as part of the installation

–  Do NOT store configuration in Indexers (not in a cluster)

• For more than 50 deployment clients

• Use in combination with another configuration management (CM) tool for

• Knowledge Objects that are created by users in Splunk Web

• How will you test the deployment, including the apps?

• Review knowledge objects

• Provides a way to enrich and combine event data

–  Create lookups based on database query results

• Most traditional approaches to providing a unified view are based on

• Applications that use the CIM will integrate more easily

• Apps that use the CIM will indicate

• Understand how scheduled reports are

• Discuss differences between data Opera'on

• Splunk search uses this technology

• A typical use case would be the deep analysis

• They always require fields and at least one Intermediate

–  Counting the number of times a specific user has logged in

• Most search performance issues can be addressed by adding additional

–  Normally, real-time searches [realtime]

• If an app contains large files that do not need to be