## Splunk Search Processing Language (SPL) for Day-to-Day Development and Data Management
This document provides a comprehensive overview of essential Splunk Search Processing Language
(SPL) commands for daily development tasks, data management, and understanding core Splunk
functionalities like data bucketing and API interactions.
-----
### 1\. Common SPL Commands for Day-to-Day Development
These are fundamental commands used to retrieve, transform, and analyze data in Splunk.
* **`search`**: The fundamental command to retrieve events from indexes.
* *Example:* `index=web sourcetype=access_combined`
* **`table`**: Displays specified fields in a tabular format, useful for quick overviews.
* *Example:* `... | table _time, host, status`
* **`fields`**: Selects which fields to include or exclude from the results.
* *Example:* `... | fields + user, action, result` or `... | fields - _raw`
* **`dedup`**: Removes duplicate events based on specified fields.
* *Example:* `... | dedup user`
* **`sort`**: Sorts results based on one or more fields.
* *Example:* `... | sort -_time` or `... | sort user, action`
* **`stats`**: Calculates aggregate statistics for fields, like count, sum, avg, etc.
* *Example:* `... | stats count by status` or `... | stats avg(response_time) as avg_resp by host`
* **`top`**: Returns the most frequent values of a field.
* *Example:* `... | top 10 uri`
* **`rare`**: Returns the least frequent values of a field.
* *Example:* `... | rare 5 user`
* **`rename`**: Changes the name of a field.
* *Example:* `... | rename host AS server_name`
* **`eval`**: Creates new fields or modifies existing ones using expressions.
* *Example:* `... | eval response_time_ms = response_time * 1000` or `... | eval status_category
= if(status >= 200 AND status < 300, "Success", "Failure")`
* **`where`**: Filters events based on a boolean expression.
* *Example:* `... | where status="200"` or `... | where bytes_sent > 1000`
* **`rex`**: Extracts fields using regular expressions.
* *Example:* `... | rex "user=(?<username>\w+)"`
* **`transaction`**: Groups related events into a single transaction.
* *Example:* `... | transaction session_id startswith="login" endswith="logout"`
* **`join`**: Combines results from two different searches based on a common field.
* *Example:* `... | join user [search index=users | table user, department]`
* **`lookup`**: Enriches events with data from external lookup tables (CSV files, KVstore).
* *Example:* `... | lookup users.csv user OUTPUT new_field`
* **`chart`**: Creates time-series charts, often used with `timechart`.
* *Example:* `... | chart count by host`
* **`timechart`**: Calculates statistics over time, breaking them down by a specified field.
* *Example:* `... | timechart count by host`
* **`streamstats`**: Calculates statistics over a streaming set of events.
* *Example:* `... | streamstats count as event_number by user`
* **`eventstats`**: Calculates statistics over all events and adds the result to each event.
* *Example:* `... | eventstats avg(duration) as overall_avg_duration`
* **`typetoken`**: Shows the statistical properties of a field.
* *Example:* `... | typetoken _raw`
* **`xmlkv`**: Extracts fields from XML formatted events.
* *Example:* `... | xmlkv`
* **`json`**: Extracts fields from JSON formatted events.
* *Example:* `... | json`
* **`multikv`**: Extracts fields from multi-line key-value pairs.
* *Example:* `... | multikv`
-----
### 2\. SPL Commands for Data Management (Null Values, Grouping, etc.)
These commands are crucial for cleaning, enriching, and summarizing your data.
#### Handling Null Values
* **`fillnull`**: Replaces null (missing) values in specified fields with a defined value (default is 0).
* *Example:* `... | fillnull value="N/A" user_id, transaction_status`
* **`filldown`**: Fills null values in a field with the last non-null value from a previous event.
(Useful for time-series data where values should persist).
* *Example:* `... | sort _time | filldown user_session_id`
#### Grouping and Aggregation
* **`stats`**: Calculates aggregate statistics (count, sum, avg, min, max, etc.) based on specified
grouping fields. This command reduces your events to summary rows.
* *Example:* `... | stats count by host, status`
* **`eventstats`**: Calculates aggregate statistics and adds the results back to *each original
event*. Unlike `stats`, it doesn't reduce the number of events.
* *Example:* `... | eventstats avg(response_time) as overall_avg_response_time`
* **`streamstats`**: Calculates statistics for each event as it is processed, based on the events that
came before it in the search results. Ideal for cumulative totals or running averages.
* *Example:* `... | sort _time | streamstats count as cumulative_events by user`
* **`transaction`**: Groups a series of related events into a single "transaction" based on a
common identifier and optional time boundaries. It adds fields like `duration` and `eventcount` to
the resulting transaction.
* *Example:* `... | transaction session_id startswith="login" endswith="logout"`
#### Other Data Management Commands
* **`collect`**: Saves search results to a summary index for faster retrieval and reduced processing
load on future searches.
* *Example:* `index=web status=200 | stats count by uri | collect index=daily_uri_counts`
* **`addtotals`**: Calculates and adds a row to a table with the sum of all numeric fields, or
specified fields.
* *Example:* `... | stats sum(bytes) as total_bytes by host | addtotals`
* **`addcoltotals`**: Calculates and adds a column to a table with the sum of specified numeric
fields.
* *Example:* `... | chart count by host, status | addcoltotals`
* **`accum`**: Calculates a cumulative sum of a specified field.
* *Example:* `... | sort _time | accum sales as cumulative_sales`
* **`cluster`**: Groups events that are similar in structure or content, even if they don't share
exact field values.
* *Example:* `... | cluster showcount=true`
* **`collapse`**: Combines identical consecutive events into a single event, showing the count of
collapsed events.
* *Example:* `... | collapse`
-----
### 3\. Difference Between `where` and `search` in SPL
Both `search` and `where` are used for filtering data, but they operate at different stages of the
search pipeline, impacting performance.
| Feature | `search` Command | `where` Command |
| :-------------- | :----------------------------------------------- | :---------------------------------------------------- |
| **Function** | Initial filtering of events from indexes. | Filters results based on complex
conditional expressions. |
| **Placement** | Primarily at the **beginning** of a search. | Always **after a pipe (`|`)** in
the search pipeline. |
| **Capabilities**| Keyword searches, field-value pairs, simple wildcards. | Complex logical
expressions, field-to-field comparisons, `eval` expressions, advanced string functions. |
| **Case-Sensitivity** | Defaults to case-insensitive. | Defaults to case-sensitive for string
comparisons (unless functions like `lower()` are used). |
| **Performance** | **Highly optimized**, filters data at the earliest stage (index-time). Crucial for
performance. | Filters data *after* retrieval and processing by preceding commands. Less efficient
for initial data reduction. |
| **When to Use** | To **reduce the initial volume of data** by filtering on indexed fields or raw
event content. | When you need **complex logic, compare fields, or filter on calculated fields** that
were created by earlier commands (e.g., `eval`). |
| **Example** | `index=firewall action=DENY` | `... | where bytes_in > bytes_out AND
status != 404` |
**Performance Best Practice:** Always apply as much filtering as possible with the initial `search`
command to reduce the data volume processed by subsequent commands.
-----
### 4\. How Data Bucketing Works in Splunk
Data bucketing is how Splunk efficiently stores, manages, and retains time-series machine data on
disk. It's a lifecycle where data moves through different storage tiers as it ages.
1. **Hot Buckets:**
* **Actively written to** with new incoming data.
* Immediately searchable.
* Reside on **fast storage** (e.g., SSDs).
2. **Warm Buckets:**
* A hot bucket "rolls" to warm when it reaches a configured size or age.
* **No new data written**; read-only for indexing.
* Still on **fast storage**.
* Renamed to reflect their time range (e.g., `db_<newest_time>_<oldest_time>_<id>`).
3. **Cold Buckets:**
* A warm bucket "rolls" to cold when it reaches a configured age or `maxWarmDBCount` is
exceeded.
* Moved to **cheaper, slower storage** (e.g., NAS, spinning disks).
* Still fully searchable.
4. **Frozen Buckets:**
* A cold bucket "rolls" to frozen when it reaches its configured retention period
(`frozenTimePeriodInSecs`).
* **Default action is deletion**.
* Can be configured to be **archived** instead of deleted (not searchable unless thawed).
5. **Thawed Buckets:**
* Archived frozen buckets can be manually "thawed" (moved back into a searchable Splunk-
managed location) for historical retrieval.
**Why Bucketing is Important:**
* **Performance Optimization:** Time-based partitioning allows quick narrowing of searches;
storage tiering optimizes costs.
* **Efficient Data Management:** Automates data retention and deletion policies.
* **Scalability:** Buckets are the units of replication and search distribution in clusters.
* **Disaster Recovery:** Simplifies backup and recovery due to organized data units.
-----
### 5\. How to Check for Bucket Details Using SPL Query
The `| dbinspect` command is used to inspect the state and properties of buckets within your
indexes.
**Basic Syntax:**
`| dbinspect index=<your_index_name>` or `| dbinspect index=*` (for all indexes)
**Key Fields in `dbinspect` Output:**
* **`bucketId`**: Unique identifier for the bucket.
* **`index`**: Name of the index.
* **`state`**: Current state (`hot`, `warm`, `cold`, `frozen`).
* **`path`**: File system path to the bucket directory.
* **`sizeOnDiskMB`**: **The actual size of the bucket on disk in megabytes.**
* **`rawSize`**: Size of uncompressed raw data in bytes.
* **`earliest`**: Earliest timestamp of an event in the bucket (Unix epoch).
* **`latest`**: Latest timestamp of an event in the bucket (Unix epoch).
* **`splunk_server`**: The Splunk server (indexer) where the bucket resides.
* **`isSearchable`**: Boolean indicating if the bucket is currently searchable.
**Example SPL Queries for Bucket Details:**
* **List all buckets for a specific index and their sizes:**
```spl
| dbinspect index=web
| table index, bucketId, state, path, sizeOnDiskMB, earliest, latest
| sort index, state, earliest
```
* **Find all hot buckets and their sizes:**
```spl
| dbinspect index=*
| where state="hot"
| table splunk_server, index, bucketId, sizeOnDiskMB
| sort -sizeOnDiskMB
```
* **Calculate total size of data per index and state:**
```spl
| dbinspect index=*
| stats sum(sizeOnDiskMB) as total_size_MB by index, state
| sort index, state
```
* **Convert `sizeOnDiskMB` to GB:**
```spl
| dbinspect index=my_index
| eval size_GB = round(sizeOnDiskMB / 1024, 2)
| table index, bucketId, state, splunk_server, size_GB, path
| sort -size_GB
```
-----
### 6\. Splunk REST Commands and Their Uses
"REST commands" in Splunk refer to two related concepts: the `| rest` SPL command and the
broader Splunk REST API.
#### 6.1. The `| rest` SPL Command
This is an SPL command that allows you to query the Splunk REST API directly from within a Splunk
search. It treats the API's response as search results.
**Syntax:** `| rest <endpoint_path> [optional arguments]`
**Key Uses (for introspection and management within Splunk Web):**
* **Monitoring Splunk Health and Performance:** Check server status (`/services/server/info`),
resource usage.
* **Managing Knowledge Objects:** List saved searches (`/services/saved/searches`), dashboards,
lookups.
* **Auditing and Troubleshooting:** View active search jobs (`/services/search/jobs`), input
configurations, user details.
**Example:**
```spl
| rest /services/saved/searches splunk_server=local
| search disabled=0 author="johndoe"
| table title, app, author, search
```
#### 6.2. The Splunk REST API (Programmatic Interface)
This is a comprehensive set of HTTP endpoints allowing external applications, scripts, or systems to
interact programmatically with Splunk. It uses standard HTTP methods (GET, POST, DELETE) over
HTTPS (default port 8089).
**How it Works:**
* **HTTP Methods:** GET (retrieve), POST (create/update), DELETE (remove).
* **Authentication:** Requires Splunk credentials (username/password for session key) or HEC
tokens.
* **Response Formats:** Typically XML or JSON.
**Common Uses (for automation, integration, and custom development):**
* **Automation of Administration Tasks:** Programmatically create/manage indexes, users, roles,
apps.
* **Integration with External Systems:** Pull search results into external reporting tools, trigger
searches, integrate alerts with ticketing systems.
* **Custom Application Development:** Build custom Splunk apps, data inputs, or modular alerts.
* **Running Searches and Managing Search Jobs:** Programmatically initiate searches, monitor
status, retrieve results, and manage jobs.
**Example (using `curl` for a direct API call - Login to get session key):**
```bash
curl -k -u admin:your_splunk_password https://localhost:8089/services/auth/login -d
username=admin -d password=your_splunk_password
```
(This returns an XML response containing the `<sessionKey>`)
**Example (using `curl` - Run a One-Shot Search using obtained session key):**
```bash
curl -k \
-H "Authorization: Splunk <your_session_key>" \
-X POST \
https://localhost:8089/services/search/jobs/oneshot \
-d search="search index=_internal | head 5 | fields _time, host, sourcetype" \
-d output_mode=json
```
-----
### 7\. Splunk HTTP Event Collector (HEC)
Splunk HEC is a secure and efficient way to send data directly to Splunk Enterprise or Splunk Cloud
over HTTP or HTTPS, specifically designed for applications, cloud services, and custom scripts.
**How HEC Works:**
1. **Token-Based Authentication:** Uses unique, long-lived tokens instead of user credentials for
security.
2. **HTTP/HTTPS Endpoints:** Splunk listens on dedicated HEC endpoints (default port 8088).
3. **Data Format:** Supports JSON-formatted events (recommended, for structured metadata) and
raw text.
4. **No Forwarder Needed:** Eliminates the need to install and manage Splunk Universal
Forwarders on sending systems.
5. **Direct Ingestion:** Data is ingested directly by indexers for near real-time availability.
6. **Load Balancing:** Supports external load balancers for scalability in distributed environments.
**Key Benefits and Use Cases:**
* **Simplified Data Ingestion:** For application logging, cloud services (AWS Lambda, Azure
Functions), and IoT devices.
* **Enhanced Security:** Token-based authentication, HTTPS, granular permissions per token.
* **Scalability and Performance:** Designed for high-volume data streams, easily scales
horizontally.
* **Reduced Overhead:** Less management of forwarder deployments.
* **Flexibility:** Supports various data types.
**How to Configure Splunk HEC:**
1. **Enable HEC Globally** in Splunk Web (Settings \> Data Inputs \> HTTP Event Collector \> Global
Settings).
2. **Create an HEC Token** (Settings \> Data Inputs \> HTTP Event Collector \> New Token),
defining its name, source type, and index permissions. Copy the generated Token Value.
3. **Configure Your Application/Client** to send HTTP POST requests to the HEC URL (e.g.,
`https://your_splunk_hostname:8088/services/collector`) with the HEC token in the `Authorization`
header (`Authorization: Splunk <your_token_value>`) and the data payload.
**Security Best Practices for HEC:**
* Always use HTTPS/SSL.
* Restrict HEC Token Permissions (least privilege).
* Use Strong Tokens.
* Implement Network Segmentation.
* Monitor HEC Activity in `_internal` index.
* Rotate HEC tokens periodically.
Splunk Search Head Clusters (SHC) are a crucial component for ensuring high availability, scalability,
and consistent user experience in a distributed Splunk environment. They allow multiple search
heads to share configurations, coordinate search activities, and provide fault tolerance.
### Splunk Search Head Cluster (SHC) Overview
A Splunk Search Head Cluster is a group of Splunk Enterprise search heads that work together to
provide a highly available and scalable search and reporting environment. Key functionalities
include:
* **High Availability:** If one search head member fails, others in the cluster can continue to serve
search requests, ensuring minimal disruption to users.
* **Load Balancing:** Search requests can be distributed across multiple search head members,
improving overall performance and responsiveness.
* **Configuration Consistency:** Knowledge objects (saved searches, dashboards, reports, field
extractions, etc.) are replicated across all cluster members, ensuring a consistent user experience
regardless of which search head a user accesses.
* **Scalability:** You can add more search head members to handle increased search load and
concurrent users.
#### Key Components of an SHC:
* **Search Head Cluster Members:** These are the individual Splunk Enterprise instances that make
up the cluster.
* **Search Head Cluster Captain:** One member of the cluster is designated as the "captain." This is
a crucial role for coordinating cluster activities.
* **Deployer:** A separate, standalone Splunk instance (not part of the SHC itself) used to
distribute apps and configuration bundles to the SHC members. It acts as a central repository for SHC
configurations.
### Dynamic Election in a Splunk Search Head Cluster
The concept of "dynamic election" refers to how the **Search Head Cluster Captain** is chosen and
maintained.
**The Role of the Captain:**
The captain is responsible for critical cluster-wide operations, including:
* **Coordinating Scheduled Searches:** Ensuring scheduled reports and alerts run reliably and on
time across the cluster.
* **Managing Search Artifacts:** Orchestrating the replication of search artifacts (results of ad-hoc
and scheduled searches) among cluster members to ensure their availability.
* **Replicating Knowledge Objects:** Distributing configuration changes (e.g., new dashboards,
updated field extractions) from one member to all other members.
* **Maintaining Cluster State:** Keeping track of the health and status of all cluster members.
* **KV Store Coordination:** Managing the KV Store for the cluster.
**Dynamic Captaincy:**
* A Splunk Search Head Cluster normally uses a **dynamic captain**. This means that the member
serving as captain **can change over time**. There isn't a fixed "primary" search head; any healthy
member can potentially become the captain.
* This dynamic nature is critical for high availability. If the current captain fails or becomes
unresponsive, the cluster can elect a new captain to take over its duties, preventing a single point of
failure for cluster-wide operations.
**Captain Election Process:**
The election process is based on a consensus algorithm (similar to Raft or Paxos, though Splunk uses
its own implementation that shares principles).
1. **Triggering Events:** A captain election is triggered when:
* The current captain fails or restarts.
* A network partition occurs, causing a disruption in communication between cluster members.
* The current captain detects that it no longer has a **majority** of cluster members participating
(it steps down).
* Explicit administrative intervention (e.g., manual captain transfer).
2. **Majority Requirement:** For a successful election to occur, a **majority of *all* cluster
members (not just those currently running)** must be online and able to communicate.
* **Example:** In a 5-member cluster, at least 3 members must be healthy and communicating
for an election to succeed (51% of 5 is 3).
* **Minimum Size:** This is why Splunk strongly recommends a **minimum of three search head
cluster members**.
* A 2-member cluster cannot tolerate any single node failure and still elect a captain (if one fails,
only 1 remains, which is not a majority of 2).
* A 1-member cluster provides no high availability benefits.
3. **Election Process:**
* When an election is triggered, all non-captain members (or remaining members if the captain
failed) become aware there's no active captain or that the existing captain has stepped down.
* Members will randomly set timers. The member whose timer runs out first will initiate the
election by proposing itself as captain.
* Other members vote for the proposed captain. If a candidate receives a majority of votes from
all configured members, it becomes the new captain.
* The election typically takes 1-2 minutes. During this time, there is no functioning captain, and
certain cluster-wide operations (like scheduled search dispatch, knowledge object replication) might
be temporarily impacted until a new captain is elected. Ad-hoc searches launched by users might still
run on individual members.
4. **No Bias:** The election process has no inherent bias towards electing the previous captain or
any specific member. Any eligible member can win the election if it secures the majority vote.
**Static Captaincy (For Recovery/Specific Scenarios):**
While dynamic captaincy is the default and recommended for high availability, Splunk provides an
option for **static captaincy**. This is typically used for:
* **Disaster Recovery:** If your cluster loses its majority (e.g., due to a major site outage), it cannot
elect a dynamic captain. In such a scenario, you can temporarily designate a specific member as a
"static captain" to bring the cluster back to a functional state.
* **Specific Maintenance:** For very specific maintenance procedures that require a stable captain,
although this is rare.
**Important Note:** Static captaincy removes the high availability benefit of dynamic election. Once
the precipitating issue is resolved, it's a best practice to revert to dynamic captaincy.
### Around Design (Architecture Best Practices for SHC)
Designing a robust Search Head Cluster involves several key considerations:
1. **Minimum of Three Members:** As discussed, this is crucial for ensuring that the cluster can
tolerate at least one member failure and still elect a captain, maintaining high availability.
2. **Dedicated Deployer:** Always have a separate, standalone Splunk instance acting as the
deployer. It should not be a search head cluster member itself. The deployer pushes apps and
configurations to the SHC members.
3. **Homogeneous Hardware:** All search head cluster members should have similar hardware
specifications (CPU, RAM, disk I/O). This ensures consistent performance and avoids bottlenecks.
4. **Dedicated Resources:** Search heads should be dedicated servers. Avoid co-locating them
with indexers or other heavy Splunk components.
5. **Load Balancer:** Implement a load balancer (e.g., F5, HAProxy, AWS ELB) in front of the SHC
members. This distributes user login requests and ad-hoc searches evenly across the cluster,
improving user experience and preventing single points of entry.
6. **Network Considerations:**
* Ensure robust, low-latency network connectivity between all SHC members and to the indexers.
* For multi-site SHCs, ensure the **majority of SHC members reside in the primary site** to
maintain captain election capability during a site-to-site network disruption. Splunk SHC itself is
**not site-aware** in the same way an indexer cluster is for data replication.
7. **Knowledge Object Management:**
* All user-created and administrator-managed knowledge objects (dashboards, reports, alerts,
field extractions, etc.) should be managed centrally via the deployer. This ensures consistent
replication across all SHC members.
* Avoid making direct configuration changes on individual SHC members (except for initial setup or
specific troubleshooting steps) as these changes might not propagate correctly.
8. **Indexer Communication:** The SHC members need to be configured to communicate with
your search peers (indexers or indexer cluster). This is typically done via the deployer.
9. **Replication Factor:** Configure the `replication_factor` for search artifacts (default 3). This
determines how many copies of search results (especially for scheduled searches) are maintained
across the SHC members, providing resilience against member failures.
10. **Monitoring:** Actively monitor the health of your SHC using the Distributed Management
Console (DMC) and the `| rest /services/shcluster/status` or `| show shcluster-status` commands.
Pay close attention to replication status, member health, and captaincy.
11. **KV Store Backup:** The KV Store is clustered within the SHC. Ensure you have a strategy for
backing up the KV Store data.
By following these design principles, you can build a resilient, scalable, and high-performing Splunk
Search Head Cluster that provides a consistent and reliable search experience for your users.