0% found this document useful (0 votes)
60 views15 pages

Splunk Interview

splunk iner

Uploaded by

kicme.sunil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views15 pages

Splunk Interview

splunk iner

Uploaded by

kicme.sunil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

## Splunk Search Processing Language (SPL) for Day-to-Day Development and Data Management

This document provides a comprehensive overview of essential Splunk Search Processing Language
(SPL) commands for daily development tasks, data management, and understanding core Splunk
functionalities like data bucketing and API interactions.

-----

### 1\. Common SPL Commands for Day-to-Day Development

These are fundamental commands used to retrieve, transform, and analyze data in Splunk.

* **`search`**: The fundamental command to retrieve events from indexes.

* *Example:* `index=web sourcetype=access_combined`

* **`table`**: Displays specified fields in a tabular format, useful for quick overviews.

* *Example:* `... | table _time, host, status`

* **`fields`**: Selects which fields to include or exclude from the results.

* *Example:* `... | fields + user, action, result` or `... | fields - _raw`

* **`dedup`**: Removes duplicate events based on specified fields.

* *Example:* `... | dedup user`

* **`sort`**: Sorts results based on one or more fields.

* *Example:* `... | sort -_time` or `... | sort user, action`

* **`stats`**: Calculates aggregate statistics for fields, like count, sum, avg, etc.

* *Example:* `... | stats count by status` or `... | stats avg(response_time) as avg_resp by host`

* **`top`**: Returns the most frequent values of a field.

* *Example:* `... | top 10 uri`

* **`rare`**: Returns the least frequent values of a field.

* *Example:* `... | rare 5 user`

* **`rename`**: Changes the name of a field.

* *Example:* `... | rename host AS server_name`

* **`eval`**: Creates new fields or modifies existing ones using expressions.


* *Example:* `... | eval response_time_ms = response_time * 1000` or `... | eval status_category
= if(status >= 200 AND status < 300, "Success", "Failure")`

* **`where`**: Filters events based on a boolean expression.

* *Example:* `... | where status="200"` or `... | where bytes_sent > 1000`

* **`rex`**: Extracts fields using regular expressions.

* *Example:* `... | rex "user=(?<username>\w+)"`

* **`transaction`**: Groups related events into a single transaction.

* *Example:* `... | transaction session_id startswith="login" endswith="logout"`

* **`join`**: Combines results from two different searches based on a common field.

* *Example:* `... | join user [search index=users | table user, department]`

* **`lookup`**: Enriches events with data from external lookup tables (CSV files, KVstore).

* *Example:* `... | lookup users.csv user OUTPUT new_field`

* **`chart`**: Creates time-series charts, often used with `timechart`.

* *Example:* `... | chart count by host`

* **`timechart`**: Calculates statistics over time, breaking them down by a specified field.

* *Example:* `... | timechart count by host`

* **`streamstats`**: Calculates statistics over a streaming set of events.

* *Example:* `... | streamstats count as event_number by user`

* **`eventstats`**: Calculates statistics over all events and adds the result to each event.

* *Example:* `... | eventstats avg(duration) as overall_avg_duration`

* **`typetoken`**: Shows the statistical properties of a field.

* *Example:* `... | typetoken _raw`

* **`xmlkv`**: Extracts fields from XML formatted events.

* *Example:* `... | xmlkv`

* **`json`**: Extracts fields from JSON formatted events.

* *Example:* `... | json`

* **`multikv`**: Extracts fields from multi-line key-value pairs.

* *Example:* `... | multikv`

-----
### 2\. SPL Commands for Data Management (Null Values, Grouping, etc.)

These commands are crucial for cleaning, enriching, and summarizing your data.

#### Handling Null Values

* **`fillnull`**: Replaces null (missing) values in specified fields with a defined value (default is 0).

* *Example:* `... | fillnull value="N/A" user_id, transaction_status`

* **`filldown`**: Fills null values in a field with the last non-null value from a previous event.
(Useful for time-series data where values should persist).

* *Example:* `... | sort _time | filldown user_session_id`

#### Grouping and Aggregation

* **`stats`**: Calculates aggregate statistics (count, sum, avg, min, max, etc.) based on specified
grouping fields. This command reduces your events to summary rows.

* *Example:* `... | stats count by host, status`

* **`eventstats`**: Calculates aggregate statistics and adds the results back to *each original
event*. Unlike `stats`, it doesn't reduce the number of events.

* *Example:* `... | eventstats avg(response_time) as overall_avg_response_time`

* **`streamstats`**: Calculates statistics for each event as it is processed, based on the events that
came before it in the search results. Ideal for cumulative totals or running averages.

* *Example:* `... | sort _time | streamstats count as cumulative_events by user`

* **`transaction`**: Groups a series of related events into a single "transaction" based on a


common identifier and optional time boundaries. It adds fields like `duration` and `eventcount` to
the resulting transaction.

* *Example:* `... | transaction session_id startswith="login" endswith="logout"`

#### Other Data Management Commands

* **`collect`**: Saves search results to a summary index for faster retrieval and reduced processing
load on future searches.

* *Example:* `index=web status=200 | stats count by uri | collect index=daily_uri_counts`


* **`addtotals`**: Calculates and adds a row to a table with the sum of all numeric fields, or
specified fields.

* *Example:* `... | stats sum(bytes) as total_bytes by host | addtotals`

* **`addcoltotals`**: Calculates and adds a column to a table with the sum of specified numeric
fields.

* *Example:* `... | chart count by host, status | addcoltotals`

* **`accum`**: Calculates a cumulative sum of a specified field.

* *Example:* `... | sort _time | accum sales as cumulative_sales`

* **`cluster`**: Groups events that are similar in structure or content, even if they don't share
exact field values.

* *Example:* `... | cluster showcount=true`

* **`collapse`**: Combines identical consecutive events into a single event, showing the count of
collapsed events.

* *Example:* `... | collapse`

-----

### 3\. Difference Between `where` and `search` in SPL

Both `search` and `where` are used for filtering data, but they operate at different stages of the
search pipeline, impacting performance.

| Feature | `search` Command | `where` Command |

| :-------------- | :----------------------------------------------- | :---------------------------------------------------- |

| **Function** | Initial filtering of events from indexes. | Filters results based on complex
conditional expressions. |

| **Placement** | Primarily at the **beginning** of a search. | Always **after a pipe (`|`)** in


the search pipeline. |

| **Capabilities**| Keyword searches, field-value pairs, simple wildcards. | Complex logical


expressions, field-to-field comparisons, `eval` expressions, advanced string functions. |

| **Case-Sensitivity** | Defaults to case-insensitive. | Defaults to case-sensitive for string


comparisons (unless functions like `lower()` are used). |

| **Performance** | **Highly optimized**, filters data at the earliest stage (index-time). Crucial for
performance. | Filters data *after* retrieval and processing by preceding commands. Less efficient
for initial data reduction. |
| **When to Use** | To **reduce the initial volume of data** by filtering on indexed fields or raw
event content. | When you need **complex logic, compare fields, or filter on calculated fields** that
were created by earlier commands (e.g., `eval`). |

| **Example** | `index=firewall action=DENY` | `... | where bytes_in > bytes_out AND


status != 404` |

**Performance Best Practice:** Always apply as much filtering as possible with the initial `search`
command to reduce the data volume processed by subsequent commands.

-----

### 4\. How Data Bucketing Works in Splunk

Data bucketing is how Splunk efficiently stores, manages, and retains time-series machine data on
disk. It's a lifecycle where data moves through different storage tiers as it ages.

1. **Hot Buckets:**

* **Actively written to** with new incoming data.

* Immediately searchable.

* Reside on **fast storage** (e.g., SSDs).

2. **Warm Buckets:**

* A hot bucket "rolls" to warm when it reaches a configured size or age.

* **No new data written**; read-only for indexing.

* Still on **fast storage**.

* Renamed to reflect their time range (e.g., `db_<newest_time>_<oldest_time>_<id>`).

3. **Cold Buckets:**

* A warm bucket "rolls" to cold when it reaches a configured age or `maxWarmDBCount` is


exceeded.

* Moved to **cheaper, slower storage** (e.g., NAS, spinning disks).

* Still fully searchable.

4. **Frozen Buckets:**

* A cold bucket "rolls" to frozen when it reaches its configured retention period
(`frozenTimePeriodInSecs`).
* **Default action is deletion**.

* Can be configured to be **archived** instead of deleted (not searchable unless thawed).

5. **Thawed Buckets:**

* Archived frozen buckets can be manually "thawed" (moved back into a searchable Splunk-
managed location) for historical retrieval.

**Why Bucketing is Important:**

* **Performance Optimization:** Time-based partitioning allows quick narrowing of searches;


storage tiering optimizes costs.

* **Efficient Data Management:** Automates data retention and deletion policies.

* **Scalability:** Buckets are the units of replication and search distribution in clusters.

* **Disaster Recovery:** Simplifies backup and recovery due to organized data units.

-----

### 5\. How to Check for Bucket Details Using SPL Query

The `| dbinspect` command is used to inspect the state and properties of buckets within your
indexes.

**Basic Syntax:**

`| dbinspect index=<your_index_name>` or `| dbinspect index=*` (for all indexes)

**Key Fields in `dbinspect` Output:**

* **`bucketId`**: Unique identifier for the bucket.

* **`index`**: Name of the index.

* **`state`**: Current state (`hot`, `warm`, `cold`, `frozen`).

* **`path`**: File system path to the bucket directory.

* **`sizeOnDiskMB`**: **The actual size of the bucket on disk in megabytes.**

* **`rawSize`**: Size of uncompressed raw data in bytes.


* **`earliest`**: Earliest timestamp of an event in the bucket (Unix epoch).

* **`latest`**: Latest timestamp of an event in the bucket (Unix epoch).

* **`splunk_server`**: The Splunk server (indexer) where the bucket resides.

* **`isSearchable`**: Boolean indicating if the bucket is currently searchable.

**Example SPL Queries for Bucket Details:**

* **List all buckets for a specific index and their sizes:**

```spl

| dbinspect index=web

| table index, bucketId, state, path, sizeOnDiskMB, earliest, latest

| sort index, state, earliest

```

* **Find all hot buckets and their sizes:**

```spl

| dbinspect index=*

| where state="hot"

| table splunk_server, index, bucketId, sizeOnDiskMB

| sort -sizeOnDiskMB

```

* **Calculate total size of data per index and state:**

```spl

| dbinspect index=*

| stats sum(sizeOnDiskMB) as total_size_MB by index, state

| sort index, state

```

* **Convert `sizeOnDiskMB` to GB:**

```spl

| dbinspect index=my_index

| eval size_GB = round(sizeOnDiskMB / 1024, 2)

| table index, bucketId, state, splunk_server, size_GB, path


| sort -size_GB

```

-----

### 6\. Splunk REST Commands and Their Uses

"REST commands" in Splunk refer to two related concepts: the `| rest` SPL command and the
broader Splunk REST API.

#### 6.1. The `| rest` SPL Command

This is an SPL command that allows you to query the Splunk REST API directly from within a Splunk
search. It treats the API's response as search results.

**Syntax:** `| rest <endpoint_path> [optional arguments]`

**Key Uses (for introspection and management within Splunk Web):**

* **Monitoring Splunk Health and Performance:** Check server status (`/services/server/info`),


resource usage.

* **Managing Knowledge Objects:** List saved searches (`/services/saved/searches`), dashboards,


lookups.

* **Auditing and Troubleshooting:** View active search jobs (`/services/search/jobs`), input


configurations, user details.

**Example:**

```spl

| rest /services/saved/searches splunk_server=local

| search disabled=0 author="johndoe"

| table title, app, author, search

```
#### 6.2. The Splunk REST API (Programmatic Interface)

This is a comprehensive set of HTTP endpoints allowing external applications, scripts, or systems to
interact programmatically with Splunk. It uses standard HTTP methods (GET, POST, DELETE) over
HTTPS (default port 8089).

**How it Works:**

* **HTTP Methods:** GET (retrieve), POST (create/update), DELETE (remove).

* **Authentication:** Requires Splunk credentials (username/password for session key) or HEC


tokens.

* **Response Formats:** Typically XML or JSON.

**Common Uses (for automation, integration, and custom development):**

* **Automation of Administration Tasks:** Programmatically create/manage indexes, users, roles,


apps.

* **Integration with External Systems:** Pull search results into external reporting tools, trigger
searches, integrate alerts with ticketing systems.

* **Custom Application Development:** Build custom Splunk apps, data inputs, or modular alerts.

* **Running Searches and Managing Search Jobs:** Programmatically initiate searches, monitor
status, retrieve results, and manage jobs.

**Example (using `curl` for a direct API call - Login to get session key):**

```bash

curl -k -u admin:your_splunk_password https://localhost:8089/services/auth/login -d


username=admin -d password=your_splunk_password

```

(This returns an XML response containing the `<sessionKey>`)


**Example (using `curl` - Run a One-Shot Search using obtained session key):**

```bash

curl -k \

-H "Authorization: Splunk <your_session_key>" \

-X POST \

https://localhost:8089/services/search/jobs/oneshot \

-d search="search index=_internal | head 5 | fields _time, host, sourcetype" \

-d output_mode=json

```

-----

### 7\. Splunk HTTP Event Collector (HEC)

Splunk HEC is a secure and efficient way to send data directly to Splunk Enterprise or Splunk Cloud
over HTTP or HTTPS, specifically designed for applications, cloud services, and custom scripts.

**How HEC Works:**

1. **Token-Based Authentication:** Uses unique, long-lived tokens instead of user credentials for
security.

2. **HTTP/HTTPS Endpoints:** Splunk listens on dedicated HEC endpoints (default port 8088).

3. **Data Format:** Supports JSON-formatted events (recommended, for structured metadata) and
raw text.

4. **No Forwarder Needed:** Eliminates the need to install and manage Splunk Universal
Forwarders on sending systems.

5. **Direct Ingestion:** Data is ingested directly by indexers for near real-time availability.

6. **Load Balancing:** Supports external load balancers for scalability in distributed environments.

**Key Benefits and Use Cases:**


* **Simplified Data Ingestion:** For application logging, cloud services (AWS Lambda, Azure
Functions), and IoT devices.

* **Enhanced Security:** Token-based authentication, HTTPS, granular permissions per token.

* **Scalability and Performance:** Designed for high-volume data streams, easily scales
horizontally.

* **Reduced Overhead:** Less management of forwarder deployments.

* **Flexibility:** Supports various data types.

**How to Configure Splunk HEC:**

1. **Enable HEC Globally** in Splunk Web (Settings \> Data Inputs \> HTTP Event Collector \> Global
Settings).

2. **Create an HEC Token** (Settings \> Data Inputs \> HTTP Event Collector \> New Token),
defining its name, source type, and index permissions. Copy the generated Token Value.

3. **Configure Your Application/Client** to send HTTP POST requests to the HEC URL (e.g.,
`https://your_splunk_hostname:8088/services/collector`) with the HEC token in the `Authorization`
header (`Authorization: Splunk <your_token_value>`) and the data payload.

**Security Best Practices for HEC:**

* Always use HTTPS/SSL.

* Restrict HEC Token Permissions (least privilege).

* Use Strong Tokens.

* Implement Network Segmentation.

* Monitor HEC Activity in `_internal` index.

* Rotate HEC tokens periodically.

Splunk Search Head Clusters (SHC) are a crucial component for ensuring high availability, scalability,
and consistent user experience in a distributed Splunk environment. They allow multiple search
heads to share configurations, coordinate search activities, and provide fault tolerance.

### Splunk Search Head Cluster (SHC) Overview


A Splunk Search Head Cluster is a group of Splunk Enterprise search heads that work together to
provide a highly available and scalable search and reporting environment. Key functionalities
include:

* **High Availability:** If one search head member fails, others in the cluster can continue to serve
search requests, ensuring minimal disruption to users.

* **Load Balancing:** Search requests can be distributed across multiple search head members,
improving overall performance and responsiveness.

* **Configuration Consistency:** Knowledge objects (saved searches, dashboards, reports, field


extractions, etc.) are replicated across all cluster members, ensuring a consistent user experience
regardless of which search head a user accesses.

* **Scalability:** You can add more search head members to handle increased search load and
concurrent users.

#### Key Components of an SHC:

* **Search Head Cluster Members:** These are the individual Splunk Enterprise instances that make
up the cluster.

* **Search Head Cluster Captain:** One member of the cluster is designated as the "captain." This is
a crucial role for coordinating cluster activities.

* **Deployer:** A separate, standalone Splunk instance (not part of the SHC itself) used to
distribute apps and configuration bundles to the SHC members. It acts as a central repository for SHC
configurations.

### Dynamic Election in a Splunk Search Head Cluster

The concept of "dynamic election" refers to how the **Search Head Cluster Captain** is chosen and
maintained.

**The Role of the Captain:**

The captain is responsible for critical cluster-wide operations, including:

* **Coordinating Scheduled Searches:** Ensuring scheduled reports and alerts run reliably and on
time across the cluster.
* **Managing Search Artifacts:** Orchestrating the replication of search artifacts (results of ad-hoc
and scheduled searches) among cluster members to ensure their availability.

* **Replicating Knowledge Objects:** Distributing configuration changes (e.g., new dashboards,


updated field extractions) from one member to all other members.

* **Maintaining Cluster State:** Keeping track of the health and status of all cluster members.

* **KV Store Coordination:** Managing the KV Store for the cluster.

**Dynamic Captaincy:**

* A Splunk Search Head Cluster normally uses a **dynamic captain**. This means that the member
serving as captain **can change over time**. There isn't a fixed "primary" search head; any healthy
member can potentially become the captain.

* This dynamic nature is critical for high availability. If the current captain fails or becomes
unresponsive, the cluster can elect a new captain to take over its duties, preventing a single point of
failure for cluster-wide operations.

**Captain Election Process:**

The election process is based on a consensus algorithm (similar to Raft or Paxos, though Splunk uses
its own implementation that shares principles).

1. **Triggering Events:** A captain election is triggered when:

* The current captain fails or restarts.

* A network partition occurs, causing a disruption in communication between cluster members.

* The current captain detects that it no longer has a **majority** of cluster members participating
(it steps down).

* Explicit administrative intervention (e.g., manual captain transfer).

2. **Majority Requirement:** For a successful election to occur, a **majority of *all* cluster


members (not just those currently running)** must be online and able to communicate.

* **Example:** In a 5-member cluster, at least 3 members must be healthy and communicating


for an election to succeed (51% of 5 is 3).

* **Minimum Size:** This is why Splunk strongly recommends a **minimum of three search head
cluster members**.

* A 2-member cluster cannot tolerate any single node failure and still elect a captain (if one fails,
only 1 remains, which is not a majority of 2).
* A 1-member cluster provides no high availability benefits.

3. **Election Process:**

* When an election is triggered, all non-captain members (or remaining members if the captain
failed) become aware there's no active captain or that the existing captain has stepped down.

* Members will randomly set timers. The member whose timer runs out first will initiate the
election by proposing itself as captain.

* Other members vote for the proposed captain. If a candidate receives a majority of votes from
all configured members, it becomes the new captain.

* The election typically takes 1-2 minutes. During this time, there is no functioning captain, and
certain cluster-wide operations (like scheduled search dispatch, knowledge object replication) might
be temporarily impacted until a new captain is elected. Ad-hoc searches launched by users might still
run on individual members.

4. **No Bias:** The election process has no inherent bias towards electing the previous captain or
any specific member. Any eligible member can win the election if it secures the majority vote.

**Static Captaincy (For Recovery/Specific Scenarios):**

While dynamic captaincy is the default and recommended for high availability, Splunk provides an
option for **static captaincy**. This is typically used for:

* **Disaster Recovery:** If your cluster loses its majority (e.g., due to a major site outage), it cannot
elect a dynamic captain. In such a scenario, you can temporarily designate a specific member as a
"static captain" to bring the cluster back to a functional state.

* **Specific Maintenance:** For very specific maintenance procedures that require a stable captain,
although this is rare.

**Important Note:** Static captaincy removes the high availability benefit of dynamic election. Once
the precipitating issue is resolved, it's a best practice to revert to dynamic captaincy.

### Around Design (Architecture Best Practices for SHC)

Designing a robust Search Head Cluster involves several key considerations:

1. **Minimum of Three Members:** As discussed, this is crucial for ensuring that the cluster can
tolerate at least one member failure and still elect a captain, maintaining high availability.
2. **Dedicated Deployer:** Always have a separate, standalone Splunk instance acting as the
deployer. It should not be a search head cluster member itself. The deployer pushes apps and
configurations to the SHC members.

3. **Homogeneous Hardware:** All search head cluster members should have similar hardware
specifications (CPU, RAM, disk I/O). This ensures consistent performance and avoids bottlenecks.

4. **Dedicated Resources:** Search heads should be dedicated servers. Avoid co-locating them
with indexers or other heavy Splunk components.

5. **Load Balancer:** Implement a load balancer (e.g., F5, HAProxy, AWS ELB) in front of the SHC
members. This distributes user login requests and ad-hoc searches evenly across the cluster,
improving user experience and preventing single points of entry.

6. **Network Considerations:**

* Ensure robust, low-latency network connectivity between all SHC members and to the indexers.

* For multi-site SHCs, ensure the **majority of SHC members reside in the primary site** to
maintain captain election capability during a site-to-site network disruption. Splunk SHC itself is
**not site-aware** in the same way an indexer cluster is for data replication.

7. **Knowledge Object Management:**

* All user-created and administrator-managed knowledge objects (dashboards, reports, alerts,


field extractions, etc.) should be managed centrally via the deployer. This ensures consistent
replication across all SHC members.

* Avoid making direct configuration changes on individual SHC members (except for initial setup or
specific troubleshooting steps) as these changes might not propagate correctly.

8. **Indexer Communication:** The SHC members need to be configured to communicate with


your search peers (indexers or indexer cluster). This is typically done via the deployer.

9. **Replication Factor:** Configure the `replication_factor` for search artifacts (default 3). This
determines how many copies of search results (especially for scheduled searches) are maintained
across the SHC members, providing resilience against member failures.

10. **Monitoring:** Actively monitor the health of your SHC using the Distributed Management
Console (DMC) and the `| rest /services/shcluster/status` or `| show shcluster-status` commands.
Pay close attention to replication status, member health, and captaincy.

11. **KV Store Backup:** The KV Store is clustered within the SHC. Ensure you have a strategy for
backing up the KV Store data.

By following these design principles, you can build a resilient, scalable, and high-performing Splunk
Search Head Cluster that provides a consistent and reliable search experience for your users.

You might also like