0% found this document useful (0 votes)

60 views15 pages

Splunk Interview

splunk iner

Uploaded by

kicme.sunil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views15 pages

Splunk Interview

splunk iner

Uploaded by

kicme.sunil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

## Splunk Search Processing Language (SPL) for Day-to-Day Development and Data Management

This document provides a comprehensive overview of essential Splunk Search Processing Language
(SPL) commands for daily development tasks, data management, and understanding core Splunk
functionalities like data bucketing and API interactions.

-----

### 1\. Common SPL Commands for Day-to-Day Development

These are fundamental commands used to retrieve, transform, and analyze data in Splunk.

* `search`: The fundamental command to retrieve events from indexes.

* Example: `index=web sourcetype=access_combined`

* **`table`**: Displays specified fields in a tabular format, useful for quick overviews.

* Example: `... | table _time, host, status`

* `fields`: Selects which fields to include or exclude from the results.

* Example: `... | fields + user, action, result` or `... | fields - _raw`

* `dedup`: Removes duplicate events based on specified fields.

* Example: `... | dedup user`

* `sort`: Sorts results based on one or more fields.

* Example: `... | sort -_time` or `... | sort user, action`

* **`stats`**: Calculates aggregate statistics for fields, like count, sum, avg, etc.

* *Example:* `... | stats count by status` or `... | stats avg(response_time) as avg_resp by host`

* `top`: Returns the most frequent values of a field.

* Example: `... | top 10 uri`

* `rare`: Returns the least frequent values of a field.

* Example: `... | rare 5 user`

* `rename`: Changes the name of a field.

* Example: `... | rename host AS server_name`

* `eval`: Creates new fields or modifies existing ones using expressions.

* *Example:* `... | eval response_time_ms = response_time * 1000` or `... | eval status_category
= if(status >= 200 AND status < 300, "Success", "Failure")`

* `where`: Filters events based on a boolean expression.

* Example: `... | where status="200"` or `... | where bytes_sent > 1000`

* `rex`: Extracts fields using regular expressions.

* Example: `... | rex "user=(?<username>\w+)"`

* `transaction`: Groups related events into a single transaction.

* Example: `... | transaction session_id startswith="login" endswith="logout"`

* **`join`**: Combines results from two different searches based on a common field.

* Example: `... | join user [search index=users | table user, department]`

* **`lookup`**: Enriches events with data from external lookup tables (CSV files, KVstore).

* Example: `... | lookup users.csv user OUTPUT new_field`

* `chart`: Creates time-series charts, often used with `timechart`.

* Example: `... | chart count by host`

* **`timechart`**: Calculates statistics over time, breaking them down by a specified field.

* Example: `... | timechart count by host`

* `streamstats`: Calculates statistics over a streaming set of events.

* Example: `... | streamstats count as event_number by user`

* **`eventstats`**: Calculates statistics over all events and adds the result to each event.

* Example: `... | eventstats avg(duration) as overall_avg_duration`

* `typetoken`: Shows the statistical properties of a field.

* Example: `... | typetoken _raw`

* `xmlkv`: Extracts fields from XML formatted events.

* Example: `... | xmlkv`

* `json`: Extracts fields from JSON formatted events.

* Example: `... | json`

* `multikv`: Extracts fields from multi-line key-value pairs.

* Example: `... | multikv`

-----
### 2\. SPL Commands for Data Management (Null Values, Grouping, etc.)

These commands are crucial for cleaning, enriching, and summarizing your data.

#### Handling Null Values

* **`fillnull`**: Replaces null (missing) values in specified fields with a defined value (default is 0).

* Example: `... | fillnull value="N/A" user_id, transaction_status`

* **`filldown`**: Fills null values in a field with the last non-null value from a previous event.
(Useful for time-series data where values should persist).

* Example: `... | sort _time | filldown user_session_id`

#### Grouping and Aggregation

* **`stats`**: Calculates aggregate statistics (count, sum, avg, min, max, etc.) based on specified
grouping fields. This command reduces your events to summary rows.

* Example: `... | stats count by host, status`

* **`eventstats`**: Calculates aggregate statistics and adds the results back to *each original
event*. Unlike `stats`, it doesn't reduce the number of events.

* Example: `... | eventstats avg(response_time) as overall_avg_response_time`

* **`streamstats`**: Calculates statistics for each event as it is processed, based on the events that
came before it in the search results. Ideal for cumulative totals or running averages.

* Example: `... | sort _time | streamstats count as cumulative_events by user`

* `transaction`: Groups a series of related events into a single "transaction" based on a

common identifier and optional time boundaries. It adds fields like `duration` and `eventcount` to
the resulting transaction.

* Example: `... | transaction session_id startswith="login" endswith="logout"`

#### Other Data Management Commands

* **`collect`**: Saves search results to a summary index for faster retrieval and reduced processing
load on future searches.

* Example: `index=web status=200 | stats count by uri | collect index=daily_uri_counts`

* **`addtotals`**: Calculates and adds a row to a table with the sum of all numeric fields, or
specified fields.

* Example: `... | stats sum(bytes) as total_bytes by host | addtotals`

* **`addcoltotals`**: Calculates and adds a column to a table with the sum of specified numeric
fields.

* Example: `... | chart count by host, status | addcoltotals`

* `accum`: Calculates a cumulative sum of a specified field.

* Example: `... | sort _time | accum sales as cumulative_sales`

* **`cluster`**: Groups events that are similar in structure or content, even if they don't share
exact field values.

* Example: `... | cluster showcount=true`

* **`collapse`**: Combines identical consecutive events into a single event, showing the count of
collapsed events.

* Example: `... | collapse`

-----

### 3\. Difference Between `where` and `search` in SPL

Both `search` and `where` are used for filtering data, but they operate at different stages of the
search pipeline, impacting performance.

| Feature | `search` Command | `where` Command |

| :-------------- | :----------------------------------------------- | :---------------------------------------------------- |

| **Function** | Initial filtering of events from indexes. | Filters results based on complex
conditional expressions. |

| Placement | Primarily at the beginning of a search. | Always after a pipe (`|`) in

the search pipeline. |

| Capabilities| Keyword searches, field-value pairs, simple wildcards. | Complex logical

expressions, field-to-field comparisons, `eval` expressions, advanced string functions. |

| Case-Sensitivity | Defaults to case-insensitive. | Defaults to case-sensitive for string

comparisons (unless functions like `lower()` are used). |

| **Performance** | **Highly optimized**, filters data at the earliest stage (index-time). Crucial for
performance. | Filters data *after* retrieval and processing by preceding commands. Less efficient
for initial data reduction. |
| **When to Use** | To **reduce the initial volume of data** by filtering on indexed fields or raw
event content. | When you need **complex logic, compare fields, or filter on calculated fields** that
were created by earlier commands (e.g., `eval`). |

| Example | `index=firewall action=DENY` | `... | where bytes_in > bytes_out AND

status != 404` |

**Performance Best Practice:** Always apply as much filtering as possible with the initial `search`
command to reduce the data volume processed by subsequent commands.

-----

### 4\. How Data Bucketing Works in Splunk

Data bucketing is how Splunk efficiently stores, manages, and retains time-series machine data on
disk. It's a lifecycle where data moves through different storage tiers as it ages.

1. **Hot Buckets:**

* Actively written to with new incoming data.

* Immediately searchable.

* Reside on fast storage (e.g., SSDs).

2. **Warm Buckets:**

* A hot bucket "rolls" to warm when it reaches a configured size or age.

* No new data written; read-only for indexing.

* Still on fast storage.

* Renamed to reflect their time range (e.g., `db_<newest_time>_<oldest_time>_<id>`).

3. **Cold Buckets:**

* A warm bucket "rolls" to cold when it reaches a configured age or `maxWarmDBCount` is

exceeded.

* Moved to cheaper, slower storage (e.g., NAS, spinning disks).

* Still fully searchable.

4. **Frozen Buckets:**

* A cold bucket "rolls" to frozen when it reaches its configured retention period
(`frozenTimePeriodInSecs`).
* **Default action is deletion**.

* Can be configured to be archived instead of deleted (not searchable unless thawed).

5. **Thawed Buckets:**

* Archived frozen buckets can be manually "thawed" (moved back into a searchable Splunk-
managed location) for historical retrieval.

Why Bucketing is Important:

* Performance Optimization: Time-based partitioning allows quick narrowing of searches;

storage tiering optimizes costs.

* Efficient Data Management: Automates data retention and deletion policies.

* **Scalability:** Buckets are the units of replication and search distribution in clusters.

* **Disaster Recovery:** Simplifies backup and recovery due to organized data units.

-----

### 5\. How to Check for Bucket Details Using SPL Query

The `| dbinspect` command is used to inspect the state and properties of buckets within your
indexes.

**Basic Syntax:**

`| dbinspect index=<your_index_name>` or `| dbinspect index=*` (for all indexes)

Key Fields in `dbinspect` Output:

* `bucketId`: Unique identifier for the bucket.

* `index`: Name of the index.

* `state`: Current state (`hot`, `warm`, `cold`, `frozen`).

* `path`: File system path to the bucket directory.

* `sizeOnDiskMB`: The actual size of the bucket on disk in megabytes.

* `rawSize`: Size of uncompressed raw data in bytes.

* **`earliest`**: Earliest timestamp of an event in the bucket (Unix epoch).

* `latest`: Latest timestamp of an event in the bucket (Unix epoch).

* `splunk_server`: The Splunk server (indexer) where the bucket resides.

* `isSearchable`: Boolean indicating if the bucket is currently searchable.

Example SPL Queries for Bucket Details:

* List all buckets for a specific index and their sizes:

```spl

| dbinspect index=web

| table index, bucketId, state, path, sizeOnDiskMB, earliest, latest

| sort index, state, earliest

```

* Find all hot buckets and their sizes:

```spl

| dbinspect index=*

| where state="hot"

| table splunk_server, index, bucketId, sizeOnDiskMB

| sort -sizeOnDiskMB

```

* Calculate total size of data per index and state:

```spl

| dbinspect index=*

| stats sum(sizeOnDiskMB) as total_size_MB by index, state

| sort index, state

```

* Convert `sizeOnDiskMB` to GB:

```spl

| dbinspect index=my_index

| eval size_GB = round(sizeOnDiskMB / 1024, 2)

| table index, bucketId, state, splunk_server, size_GB, path

| sort -size_GB

```

-----

### 6\. Splunk REST Commands and Their Uses

"REST commands" in Splunk refer to two related concepts: the `| rest` SPL command and the
broader Splunk REST API.

#### 6.1. The `| rest` SPL Command

This is an SPL command that allows you to query the Splunk REST API directly from within a Splunk
search. It treats the API's response as search results.

Syntax: `| rest <endpoint_path> [optional arguments]`

Key Uses (for introspection and management within Splunk Web):

* Monitoring Splunk Health and Performance: Check server status (`/services/server/info`),

resource usage.

* Managing Knowledge Objects: List saved searches (`/services/saved/searches`), dashboards,

lookups.

* Auditing and Troubleshooting: View active search jobs (`/services/search/jobs`), input

configurations, user details.

**Example:**

```spl

| rest /services/saved/searches splunk_server=local

| search disabled=0 author="johndoe"

| table title, app, author, search

```
#### 6.2. The Splunk REST API (Programmatic Interface)

This is a comprehensive set of HTTP endpoints allowing external applications, scripts, or systems to
interact programmatically with Splunk. It uses standard HTTP methods (GET, POST, DELETE) over
HTTPS (default port 8089).

**How it Works:**

* HTTP Methods: GET (retrieve), POST (create/update), DELETE (remove).

* Authentication: Requires Splunk credentials (username/password for session key) or HEC

tokens.

* Response Formats: Typically XML or JSON.

Common Uses (for automation, integration, and custom development):

* Automation of Administration Tasks: Programmatically create/manage indexes, users, roles,

apps.

* **Integration with External Systems:** Pull search results into external reporting tools, trigger
searches, integrate alerts with ticketing systems.

* **Custom Application Development:** Build custom Splunk apps, data inputs, or modular alerts.

* **Running Searches and Managing Search Jobs:** Programmatically initiate searches, monitor
status, retrieve results, and manage jobs.

**Example (using `curl` for a direct API call - Login to get session key):**

```bash

curl -k -u admin:your_splunk_password https://localhost:8089/services/auth/login -d

username=admin -d password=your_splunk_password

```

(This returns an XML response containing the `<sessionKey>`)

**Example (using `curl` - Run a One-Shot Search using obtained session key):**

```bash

curl -k \

-H "Authorization: Splunk <your_session_key>" \

-X POST \

https://localhost:8089/services/search/jobs/oneshot \

-d search="search index=_internal | head 5 | fields _time, host, sourcetype" \

-d output_mode=json

```

-----

### 7\. Splunk HTTP Event Collector (HEC)

Splunk HEC is a secure and efficient way to send data directly to Splunk Enterprise or Splunk Cloud
over HTTP or HTTPS, specifically designed for applications, cloud services, and custom scripts.

How HEC Works:

1. **Token-Based Authentication:** Uses unique, long-lived tokens instead of user credentials for
security.

2. **HTTP/HTTPS Endpoints:** Splunk listens on dedicated HEC endpoints (default port 8088).

3. **Data Format:** Supports JSON-formatted events (recommended, for structured metadata) and
raw text.

4. **No Forwarder Needed:** Eliminates the need to install and manage Splunk Universal
Forwarders on sending systems.

5. **Direct Ingestion:** Data is ingested directly by indexers for near real-time availability.

6. **Load Balancing:** Supports external load balancers for scalability in distributed environments.

Key Benefits and Use Cases:

* **Simplified Data Ingestion:** For application logging, cloud services (AWS Lambda, Azure
Functions), and IoT devices.

* Enhanced Security: Token-based authentication, HTTPS, granular permissions per token.

* **Scalability and Performance:** Designed for high-volume data streams, easily scales
horizontally.

* Reduced Overhead: Less management of forwarder deployments.

* Flexibility: Supports various data types.

How to Configure Splunk HEC:

1. **Enable HEC Globally** in Splunk Web (Settings \> Data Inputs \> HTTP Event Collector \> Global
Settings).

2. **Create an HEC Token** (Settings \> Data Inputs \> HTTP Event Collector \> New Token),
defining its name, source type, and index permissions. Copy the generated Token Value.

3. **Configure Your Application/Client** to send HTTP POST requests to the HEC URL (e.g.,
`https://your_splunk_hostname:8088/services/collector`) with the HEC token in the `Authorization`
header (`Authorization: Splunk <your_token_value>`) and the data payload.

Security Best Practices for HEC:

* Always use HTTPS/SSL.

* Restrict HEC Token Permissions (least privilege).

* Use Strong Tokens.

* Implement Network Segmentation.

* Monitor HEC Activity in `_internal` index.

* Rotate HEC tokens periodically.

Splunk Search Head Clusters (SHC) are a crucial component for ensuring high availability, scalability,
and consistent user experience in a distributed Splunk environment. They allow multiple search
heads to share configurations, coordinate search activities, and provide fault tolerance.

### Splunk Search Head Cluster (SHC) Overview

A Splunk Search Head Cluster is a group of Splunk Enterprise search heads that work together to
provide a highly available and scalable search and reporting environment. Key functionalities
include:

* **High Availability:** If one search head member fails, others in the cluster can continue to serve
search requests, ensuring minimal disruption to users.

* **Load Balancing:** Search requests can be distributed across multiple search head members,
improving overall performance and responsiveness.

* Configuration Consistency: Knowledge objects (saved searches, dashboards, reports, field

extractions, etc.) are replicated across all cluster members, ensuring a consistent user experience
regardless of which search head a user accesses.

* **Scalability:** You can add more search head members to handle increased search load and
concurrent users.

#### Key Components of an SHC:

* **Search Head Cluster Members:** These are the individual Splunk Enterprise instances that make
up the cluster.

* **Search Head Cluster Captain:** One member of the cluster is designated as the "captain." This is
a crucial role for coordinating cluster activities.

* **Deployer:** A separate, standalone Splunk instance (not part of the SHC itself) used to
distribute apps and configuration bundles to the SHC members. It acts as a central repository for SHC
configurations.

### Dynamic Election in a Splunk Search Head Cluster

The concept of "dynamic election" refers to how the **Search Head Cluster Captain** is chosen and
maintained.

The Role of the Captain:

The captain is responsible for critical cluster-wide operations, including:

* **Coordinating Scheduled Searches:** Ensuring scheduled reports and alerts run reliably and on
time across the cluster.
* **Managing Search Artifacts:** Orchestrating the replication of search artifacts (results of ad-hoc
and scheduled searches) among cluster members to ensure their availability.

* Replicating Knowledge Objects: Distributing configuration changes (e.g., new dashboards,

updated field extractions) from one member to all other members.

* **Maintaining Cluster State:** Keeping track of the health and status of all cluster members.

* KV Store Coordination: Managing the KV Store for the cluster.

**Dynamic Captaincy:**

* A Splunk Search Head Cluster normally uses a **dynamic captain**. This means that the member
serving as captain **can change over time**. There isn't a fixed "primary" search head; any healthy
member can potentially become the captain.

* This dynamic nature is critical for high availability. If the current captain fails or becomes
unresponsive, the cluster can elect a new captain to take over its duties, preventing a single point of
failure for cluster-wide operations.

Captain Election Process:

The election process is based on a consensus algorithm (similar to Raft or Paxos, though Splunk uses
its own implementation that shares principles).

1. Triggering Events: A captain election is triggered when:

* The current captain fails or restarts.

* A network partition occurs, causing a disruption in communication between cluster members.

* The current captain detects that it no longer has a **majority** of cluster members participating
(it steps down).

* Explicit administrative intervention (e.g., manual captain transfer).

2. Majority Requirement: For a successful election to occur, a **majority of all cluster

members (not just those currently running)** must be online and able to communicate.

* Example: In a 5-member cluster, at least 3 members must be healthy and communicating

for an election to succeed (51% of 5 is 3).

* **Minimum Size:** This is why Splunk strongly recommends a **minimum of three search head
cluster members**.

* A 2-member cluster cannot tolerate any single node failure and still elect a captain (if one fails,
only 1 remains, which is not a majority of 2).
* A 1-member cluster provides no high availability benefits.

3. **Election Process:**

* When an election is triggered, all non-captain members (or remaining members if the captain
failed) become aware there's no active captain or that the existing captain has stepped down.

* Members will randomly set timers. The member whose timer runs out first will initiate the
election by proposing itself as captain.

* Other members vote for the proposed captain. If a candidate receives a majority of votes from
all configured members, it becomes the new captain.

* The election typically takes 1-2 minutes. During this time, there is no functioning captain, and
certain cluster-wide operations (like scheduled search dispatch, knowledge object replication) might
be temporarily impacted until a new captain is elected. Ad-hoc searches launched by users might still
run on individual members.

4. **No Bias:** The election process has no inherent bias towards electing the previous captain or
any specific member. Any eligible member can win the election if it secures the majority vote.

Static Captaincy (For Recovery/Specific Scenarios):

While dynamic captaincy is the default and recommended for high availability, Splunk provides an
option for **static captaincy**. This is typically used for:

* **Disaster Recovery:** If your cluster loses its majority (e.g., due to a major site outage), it cannot
elect a dynamic captain. In such a scenario, you can temporarily designate a specific member as a
"static captain" to bring the cluster back to a functional state.

* **Specific Maintenance:** For very specific maintenance procedures that require a stable captain,
although this is rare.

**Important Note:** Static captaincy removes the high availability benefit of dynamic election. Once
the precipitating issue is resolved, it's a best practice to revert to dynamic captaincy.

### Around Design (Architecture Best Practices for SHC)

Designing a robust Search Head Cluster involves several key considerations:

1. **Minimum of Three Members:** As discussed, this is crucial for ensuring that the cluster can
tolerate at least one member failure and still elect a captain, maintaining high availability.
2. **Dedicated Deployer:** Always have a separate, standalone Splunk instance acting as the
deployer. It should not be a search head cluster member itself. The deployer pushes apps and
configurations to the SHC members.

3. **Homogeneous Hardware:** All search head cluster members should have similar hardware
specifications (CPU, RAM, disk I/O). This ensures consistent performance and avoids bottlenecks.

4. **Dedicated Resources:** Search heads should be dedicated servers. Avoid co-locating them
with indexers or other heavy Splunk components.

5. **Load Balancer:** Implement a load balancer (e.g., F5, HAProxy, AWS ELB) in front of the SHC
members. This distributes user login requests and ad-hoc searches evenly across the cluster,
improving user experience and preventing single points of entry.

6. **Network Considerations:**

* Ensure robust, low-latency network connectivity between all SHC members and to the indexers.

* For multi-site SHCs, ensure the **majority of SHC members reside in the primary site** to
maintain captain election capability during a site-to-site network disruption. Splunk SHC itself is
**not site-aware** in the same way an indexer cluster is for data replication.

7. Knowledge Object Management:

* All user-created and administrator-managed knowledge objects (dashboards, reports, alerts,

field extractions, etc.) should be managed centrally via the deployer. This ensures consistent
replication across all SHC members.

* Avoid making direct configuration changes on individual SHC members (except for initial setup or
specific troubleshooting steps) as these changes might not propagate correctly.

8. Indexer Communication: The SHC members need to be configured to communicate with

your search peers (indexers or indexer cluster). This is typically done via the deployer.

9. **Replication Factor:** Configure the `replication_factor` for search artifacts (default 3). This
determines how many copies of search results (especially for scheduled searches) are maintained
across the SHC members, providing resilience against member failures.

10. **Monitoring:** Actively monitor the health of your SHC using the Distributed Management
Console (DMC) and the `| rest /services/shcluster/status` or `| show shcluster-status` commands.
Pay close attention to replication status, member health, and captaincy.

11. **KV Store Backup:** The KV Store is clustered within the SHC. Ensure you have a strategy for
backing up the KV Store data.

By following these design principles, you can build a resilient, scalable, and high-performing Splunk
Search Head Cluster that provides a consistent and reliable search experience for your users.

Essential Splunk Commands Guide
No ratings yet
Essential Splunk Commands Guide
43 pages
Splunk Command Cheat Sheet
No ratings yet
Splunk Command Cheat Sheet
10 pages
Splunk Command Use Cases and Examples
No ratings yet
Splunk Command Use Cases and Examples
61 pages
Splunk SPL Commands Quick Reference
No ratings yet
Splunk SPL Commands Quick Reference
3 pages
Splunk Search Command Cheat Sheet
100% (1)
Splunk Search Command Cheat Sheet
2 pages
Splunk User Training: Search & Visualization
No ratings yet
Splunk User Training: Search & Visualization
7 pages
Lab Statistical Processing
No ratings yet
Lab Statistical Processing
18 pages
LogPoint - Cheat Sheet - v6
100% (1)
LogPoint - Cheat Sheet - v6
3 pages
Splunk 4.x Cheatsheet
100% (1)
Splunk 4.x Cheatsheet
8 pages
Lab Visualization Splunk
No ratings yet
Lab Visualization Splunk
19 pages
Splunk Lab - Search Under The Hood
No ratings yet
Splunk Lab - Search Under The Hood
11 pages
Splunk SPL
No ratings yet
Splunk SPL
4 pages
Overview of Splunk Components and Functions
No ratings yet
Overview of Splunk Components and Functions
2 pages
Splunk SPL Commands
No ratings yet
Splunk SPL Commands
2 pages
Splunk Lab - Enriching Data With Lookups
No ratings yet
Splunk Lab - Enriching Data With Lookups
15 pages
Splunk SPL Guide for Analysts
No ratings yet
Splunk SPL Guide for Analysts
22 pages
Splunk Basics for IT Professionals
No ratings yet
Splunk Basics for IT Professionals
9 pages
Splunk Lab Guide for Search Optimization
No ratings yet
Splunk Lab Guide for Search Optimization
10 pages
Logpoint Search Query Language
No ratings yet
Logpoint Search Query Language
65 pages
Splunk Visualizations Lab Solutions Guide
100% (3)
Splunk Visualizations Lab Solutions Guide
21 pages
Using Fields - Lab Guide: Exact
No ratings yet
Using Fields - Lab Guide: Exact
10 pages
Splunk
No ratings yet
Splunk
2 pages
Lab Lookups Subsearches
0% (1)
Lab Lookups Subsearches
16 pages
SPL
No ratings yet
SPL
6 pages
Splunk Lab - Using Fields
100% (1)
Splunk Lab - Using Fields
10 pages
Splunk Fundamentals 3: Course Topics
No ratings yet
Splunk Fundamentals 3: Course Topics
1 page
Splunk Quick Ref Guide
No ratings yet
Splunk Quick Ref Guide
6 pages
? 1.0 Using Transforming Commands For Visualizations - Exam Notes
No ratings yet
? 1.0 Using Transforming Commands For Visualizations - Exam Notes
3 pages
Splunk Lab - Creating Knowledge Objects
0% (1)
Splunk Lab - Creating Knowledge Objects
18 pages
Splunk Fundamentals
No ratings yet
Splunk Fundamentals
84 pages
VCM 4445
No ratings yet
VCM 4445
54 pages
Working With Time - Lab Guide: Index Type Sourcetype Interesting Fields
0% (7)
Working With Time - Lab Guide: Index Type Sourcetype Interesting Fields
8 pages
Splunk Fundamentals 1 Lab Exercises: (Sourcetype DB - Audit) (Cs - Mime - Type)
No ratings yet
Splunk Fundamentals 1 Lab Exercises: (Sourcetype DB - Audit) (Cs - Mime - Type)
8 pages
Learn Splunk Online Training
No ratings yet
Learn Splunk Online Training
25 pages
Terms Splunk
No ratings yet
Terms Splunk
62 pages
Splunk Admin Course Contents
100% (1)
Splunk Admin Course Contents
4 pages
Roadmap For Data Engineering
No ratings yet
Roadmap For Data Engineering
33 pages
Splunk Hands-On Labs Guide
100% (1)
Splunk Hands-On Labs Guide
41 pages
Splunk 7.X Fundamentals Part 2 (IOD) PDF
80% (15)
Splunk 7.X Fundamentals Part 2 (IOD) PDF
281 pages
Splunk Fundamentals 1 Lab Exercises: Lab Module 9 - Transforming Commands
No ratings yet
Splunk Fundamentals 1 Lab Exercises: Lab Module 9 - Transforming Commands
14 pages
Abinitio Components
No ratings yet
Abinitio Components
10 pages
Working With Time - Lab Solutions Guide: Index Type Sourcetype Interesting Fields
No ratings yet
Working With Time - Lab Solutions Guide: Index Type Sourcetype Interesting Fields
10 pages
ELK 2 3 - Logstash Filtering - Structured Data
No ratings yet
ELK 2 3 - Logstash Filtering - Structured Data
15 pages
Understanding Log Sources & Investigating With Splunk
No ratings yet
Understanding Log Sources & Investigating With Splunk
69 pages
Splunk Cheat Sheet
No ratings yet
Splunk Cheat Sheet
4 pages
Splunk Fundamentals 2 - Lab Exercises: Production Environment. Screenshots Approximate What You Should See
0% (1)
Splunk Fundamentals 2 - Lab Exercises: Production Environment. Screenshots Approximate What You Should See
63 pages
Splunk Test Blueprint Power User v.1.1
No ratings yet
Splunk Test Blueprint Power User v.1.1
3 pages
Splunk Power User Exam Guide
0% (1)
Splunk Power User Exam Guide
3 pages
Splunk 7.X Fundamentals Part 2 (IOD)
100% (5)
Splunk 7.X Fundamentals Part 2 (IOD)
281 pages
Splunk Enterprise: Modules & Features
No ratings yet
Splunk Enterprise: Modules & Features
11 pages
Lesson 200.2 Basic Searching
No ratings yet
Lesson 200.2 Basic Searching
74 pages
Splunk Fundamentals Course Overview
100% (1)
Splunk Fundamentals Course Overview
2 pages
Dsmlusingpython
No ratings yet
Dsmlusingpython
10 pages
Data Analtycs Professional-1
No ratings yet
Data Analtycs Professional-1
15 pages
Data Transformation Techniques Explained
No ratings yet
Data Transformation Techniques Explained
6 pages
Fundamentals2 LabSolutions8.0
No ratings yet
Fundamentals2 LabSolutions8.0
70 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
10 pages
jQuery AJAX: Setup and Callbacks Guide
No ratings yet
jQuery AJAX: Setup and Callbacks Guide
17 pages
Docu57862 Documentum Content Server 7.2 Release Notes
No ratings yet
Docu57862 Documentum Content Server 7.2 Release Notes
36 pages
Word Heading Styles Guide
No ratings yet
Word Heading Styles Guide
3 pages
BCS L3 Digital Marketer IfATE V1.1 KM1 Principles of Coding Sample Paper A V1.1
No ratings yet
BCS L3 Digital Marketer IfATE V1.1 KM1 Principles of Coding Sample Paper A V1.1
7 pages
SIWES Report: Student Experience at Smith Ventures
No ratings yet
SIWES Report: Student Experience at Smith Ventures
22 pages
Chapter 3: Test Choose: (5x1 5 Marks)
No ratings yet
Chapter 3: Test Choose: (5x1 5 Marks)
5 pages
HCM Cloud Web Services Autom Feeds 20180306
No ratings yet
HCM Cloud Web Services Autom Feeds 20180306
32 pages
Gossip: Web Development Project Report
No ratings yet
Gossip: Web Development Project Report
19 pages
UCI Command Line Reference v3
No ratings yet
UCI Command Line Reference v3
8 pages
Easyprivacy
No ratings yet
Easyprivacy
544 pages
Cybersecurity Cheatsheet
No ratings yet
Cybersecurity Cheatsheet
9 pages
Nextjs SEO Guide
No ratings yet
Nextjs SEO Guide
127 pages
O-RAN WG10 TS O1-Interface 0-R004-v15 00
No ratings yet
O-RAN WG10 TS O1-Interface 0-R004-v15 00
81 pages
Online Food Ordering System Project
No ratings yet
Online Food Ordering System Project
19 pages
PHP Dynamic Website Development Guide
100% (1)
PHP Dynamic Website Development Guide
8 pages
Accuriopress C6100/C6085 Quick Guide: Colour Production Printing System
No ratings yet
Accuriopress C6100/C6085 Quick Guide: Colour Production Printing System
78 pages
I73 Plus
No ratings yet
I73 Plus
71 pages
UNCW Canvas Instructor Guide 2018
No ratings yet
UNCW Canvas Instructor Guide 2018
42 pages
B.C.A. Data Science
No ratings yet
B.C.A. Data Science
83 pages
6866539D67-YK Enus Integrated Terminal Management 7.8 Client User Guide
No ratings yet
6866539D67-YK Enus Integrated Terminal Management 7.8 Client User Guide
112 pages
Software vs Program: Key Differences
No ratings yet
Software vs Program: Key Differences
7 pages
Supply Chain Management System
No ratings yet
Supply Chain Management System
73 pages
Computer Science Final Test Template
No ratings yet
Computer Science Final Test Template
4 pages
SAP MII Instalation Guide
No ratings yet
SAP MII Instalation Guide
20 pages
HTML and CSS Basics Overview
No ratings yet
HTML and CSS Basics Overview
32 pages
Customize Windows 10 Start Menu Layout Via GPO
No ratings yet
Customize Windows 10 Start Menu Layout Via GPO
17 pages
Criminal Record System SRS Document
No ratings yet
Criminal Record System SRS Document
17 pages
Frontend Developer Resume - Nirmala Chudasama
No ratings yet
Frontend Developer Resume - Nirmala Chudasama
3 pages
Computer Science Model Question Paper 1
No ratings yet
Computer Science Model Question Paper 1
15 pages
Frontend Testing Tools and Techniques
No ratings yet
Frontend Testing Tools and Techniques
3 pages

Splunk Interview

Uploaded by

Splunk Interview

Uploaded by

## Splunk Search Processing Language (SPL) for Day-to-Day Development and Data Management

### 1\. Common SPL Commands for Day-to-Day Development

* **`search`**: The fundamental command to retrieve events from indexes.

* *Example:* `index=web sourcetype=access_combined`

* *Example:* `... | table _time, host, status`

* **`fields`**: Selects which fields to include or exclude from the results.

* *Example:* `... | fields + user, action, result` or `... | fields - _raw`

* **`dedup`**: Removes duplicate events based on specified fields.

* *Example:* `... | dedup user`

* **`sort`**: Sorts results based on one or more fields.

* *Example:* `... | sort -_time` or `... | sort user, action`

* **`top`**: Returns the most frequent values of a field.

* *Example:* `... | top 10 uri`

* **`rare`**: Returns the least frequent values of a field.

* *Example:* `... | rare 5 user`

* **`rename`**: Changes the name of a field.

* *Example:* `... | rename host AS server_name`

* **`eval`**: Creates new fields or modifies existing ones using expressions.

* **`where`**: Filters events based on a boolean expression.

* *Example:* `... | where status="200"` or `... | where bytes_sent > 1000`

* **`rex`**: Extracts fields using regular expressions.

* *Example:* `... | rex "user=(?<username>\w+)"`

* **`transaction`**: Groups related events into a single transaction.

* *Example:* `... | transaction session_id startswith="login" endswith="logout"`

* *Example:* `... | join user [search index=users | table user, department]`

* *Example:* `... | lookup users.csv user OUTPUT new_field`

* **`chart`**: Creates time-series charts, often used with `timechart`.

* *Example:* `... | chart count by host`

* *Example:* `... | timechart count by host`

* **`streamstats`**: Calculates statistics over a streaming set of events.

* *Example:* `... | streamstats count as event_number by user`

* *Example:* `... | eventstats avg(duration) as overall_avg_duration`

* **`typetoken`**: Shows the statistical properties of a field.

* *Example:* `... | typetoken _raw`

* **`xmlkv`**: Extracts fields from XML formatted events.

* *Example:* `... | xmlkv`

* **`json`**: Extracts fields from JSON formatted events.

* *Example:* `... | json`

* **`multikv`**: Extracts fields from multi-line key-value pairs.

* *Example:* `... | multikv`

#### Handling Null Values

* *Example:* `... | fillnull value="N/A" user_id, transaction_status`

* *Example:* `... | sort _time | filldown user_session_id`

#### Grouping and Aggregation

* *Example:* `... | stats count by host, status`

* *Example:* `... | eventstats avg(response_time) as overall_avg_response_time`

* *Example:* `... | sort _time | streamstats count as cumulative_events by user`

* **`transaction`**: Groups a series of related events into a single "transaction" based on a

* *Example:* `... | transaction session_id startswith="login" endswith="logout"`

#### Other Data Management Commands

* *Example:* `index=web status=200 | stats count by uri | collect index=daily_uri_counts`

* *Example:* `... | stats sum(bytes) as total_bytes by host | addtotals`

* *Example:* `... | chart count by host, status | addcoltotals`

* **`accum`**: Calculates a cumulative sum of a specified field.

* *Example:* `... | sort _time | accum sales as cumulative_sales`

* *Example:* `... | cluster showcount=true`

* *Example:* `... | collapse`

### 3\. Difference Between `where` and `search` in SPL

| Feature | `search` Command | `where` Command |

| :-------------- | :----------------------------------------------- | :---------------------------------------------------- |

| **Placement** | Primarily at the **beginning** of a search. | Always **after a pipe (`|`)** in

| **Capabilities**| Keyword searches, field-value pairs, simple wildcards. | Complex logical

| **Case-Sensitivity** | Defaults to case-insensitive. | Defaults to case-sensitive for string

| **Example** | `index=firewall action=DENY` | `... | where bytes_in > bytes_out AND

### 4\. How Data Bucketing Works in Splunk

* **Actively written to** with new incoming data.

* Reside on **fast storage** (e.g., SSDs).

* A hot bucket "rolls" to warm when it reaches a configured size or age.

* **No new data written**; read-only for indexing.

* Still on **fast storage**.

* Renamed to reflect their time range (e.g., `db_<newest_time>_<oldest_time>_<id>`).

* A warm bucket "rolls" to cold when it reaches a configured age or `maxWarmDBCount` is

* Moved to **cheaper, slower storage** (e.g., NAS, spinning disks).

* Still fully searchable.

* Can be configured to be **archived** instead of deleted (not searchable unless thawed).

**Why Bucketing is Important:**

* `search`: The fundamental command to retrieve events from indexes.

* Example: `index=web sourcetype=access_combined`

* Example: `... | table _time, host, status`

* `fields`: Selects which fields to include or exclude from the results.

* Example: `... | fields + user, action, result` or `... | fields - _raw`

* `dedup`: Removes duplicate events based on specified fields.

* Example: `... | dedup user`

* `sort`: Sorts results based on one or more fields.

* Example: `... | sort -_time` or `... | sort user, action`

* `top`: Returns the most frequent values of a field.

* Example: `... | top 10 uri`

* `rare`: Returns the least frequent values of a field.

* Example: `... | rare 5 user`

* `rename`: Changes the name of a field.

* Example: `... | rename host AS server_name`

* `eval`: Creates new fields or modifies existing ones using expressions.

* `where`: Filters events based on a boolean expression.

* Example: `... | where status="200"` or `... | where bytes_sent > 1000`

* `rex`: Extracts fields using regular expressions.

* Example: `... | rex "user=(?<username>\w+)"`

* `transaction`: Groups related events into a single transaction.

* Example: `... | transaction session_id startswith="login" endswith="logout"`

* Example: `... | join user [search index=users | table user, department]`

* Example: `... | lookup users.csv user OUTPUT new_field`

* `chart`: Creates time-series charts, often used with `timechart`.

* Example: `... | chart count by host`

* Example: `... | timechart count by host`

* `streamstats`: Calculates statistics over a streaming set of events.

* Example: `... | streamstats count as event_number by user`

* Example: `... | eventstats avg(duration) as overall_avg_duration`

* `typetoken`: Shows the statistical properties of a field.

* Example: `... | typetoken _raw`

* `xmlkv`: Extracts fields from XML formatted events.

* Example: `... | xmlkv`

* `json`: Extracts fields from JSON formatted events.

* Example: `... | json`

* `multikv`: Extracts fields from multi-line key-value pairs.

* Example: `... | multikv`

* Example: `... | fillnull value="N/A" user_id, transaction_status`

* Example: `... | sort _time | filldown user_session_id`

* Example: `... | stats count by host, status`

* Example: `... | eventstats avg(response_time) as overall_avg_response_time`

* Example: `... | sort _time | streamstats count as cumulative_events by user`

* `transaction`: Groups a series of related events into a single "transaction" based on a

* Example: `... | transaction session_id startswith="login" endswith="logout"`

* Example: `index=web status=200 | stats count by uri | collect index=daily_uri_counts`

* Example: `... | stats sum(bytes) as total_bytes by host | addtotals`

* Example: `... | chart count by host, status | addcoltotals`

* `accum`: Calculates a cumulative sum of a specified field.

* Example: `... | sort _time | accum sales as cumulative_sales`

* Example: `... | cluster showcount=true`

* Example: `... | collapse`

| Placement | Primarily at the beginning of a search. | Always after a pipe (`|`) in

| Capabilities| Keyword searches, field-value pairs, simple wildcards. | Complex logical

| Case-Sensitivity | Defaults to case-insensitive. | Defaults to case-sensitive for string

| Example | `index=firewall action=DENY` | `... | where bytes_in > bytes_out AND

* Actively written to with new incoming data.

* Reside on fast storage (e.g., SSDs).

* No new data written; read-only for indexing.

* Still on fast storage.

* Moved to cheaper, slower storage (e.g., NAS, spinning disks).

* Can be configured to be archived instead of deleted (not searchable unless thawed).

Why Bucketing is Important:

* Performance Optimization: Time-based partitioning allows quick narrowing of searches;

* Efficient Data Management: Automates data retention and deletion policies.

Key Fields in `dbinspect` Output:

* `bucketId`: Unique identifier for the bucket.

* `index`: Name of the index.

* `state`: Current state (`hot`, `warm`, `cold`, `frozen`).

* `path`: File system path to the bucket directory.

* `sizeOnDiskMB`: The actual size of the bucket on disk in megabytes.

* `rawSize`: Size of uncompressed raw data in bytes.

* `latest`: Latest timestamp of an event in the bucket (Unix epoch).

* `splunk_server`: The Splunk server (indexer) where the bucket resides.

* `isSearchable`: Boolean indicating if the bucket is currently searchable.

Example SPL Queries for Bucket Details:

* List all buckets for a specific index and their sizes:

* Find all hot buckets and their sizes:

* Calculate total size of data per index and state:

* Convert `sizeOnDiskMB` to GB: