Professional Documents
Culture Documents
Hands-on Challenges
© 2019 SPLUNK INC.
Simple concept
Workshop Goals
1
2. Change to the
Search Tab
?
What’s the benefit of
renaming variables?
Use fieldsummary to explore your dataset
© 2019 SPLUNK INC.
1
Eliminate unwanted
fields with
| fields - values
?
What’s going on
with the engine
coolant
temperature?
Explore your dataset with visualizations
© 2019 SPLUNK INC.
1
Using Splunk MLTK’s Histogram Macro
© 2019 SPLUNK INC.
OR
1
| stats count by x_batteryVoltage
x_batteryVoltage count
| chart count over x_batteryVoltage by y_vehicleType
13.16-13.17 3 13 0 0 0 0 1
13.46-13.47 1 14 0 0 0 1 1
15 1 0 1 0 0
16 1 1 1 0 0
17 1 0 0 0 1
Working with the Boxplot Macro
© 2019 SPLUNK INC.
1
? How can this query be improved?
Hints:
> Scale numeric values using
the fit command with the
StandardScaler
Explore the Dataset with Box Plots
© 2019 SPLUNK INC.
15 minute break
© 2019 SPLUNK INC.
> Explore the Outlier > Start your own Outlier > Optionally try to compare
Detection Showcases Detection Experiment different outlier detection
approaches
Explore the Outlier Detection Showcases
© 2019 SPLUNK INC.
> Switch to the Experiments tab of the MLTK and create a new experiment
> Instead of an approach based on statistics we are now going to use the
density function to detect outliers.
Create Your Own Smart Outlier Experiment
© 2019 SPLUNK INC.
2
Click here to get to the
next step
3
Examples:
<your search> | fit <model name>
<your search> | apply <model name>
> The fit command produces a machine learning model based on the
behaviour of a set of events. It applies the model to the current
search results in the search pipeline
> The apply command applies the machine learning model
that was learned using the fit command
SPL for MLTK: Adapting fit and apply
© 2019 SPLUNK INC.
3
Examples:
<your search> | fit StandardScaler <fields> into <model name>
<your search> | apply <model name> | `<macro name>`
<your search> | fit SVM “X X X" from “XXX" “XXX" kfold_cv=3
Check out the confusion
matrix and classification
statistics macros!
> The StandardScaler algorithm uses the scikit-learn
StandardScaler algorithm to standardize data fields
> Splunk’s MLTK allows you to cross-validate your models
right from the search queries that train them. Simply
specify the number of cross-validation folds you want
by setting the fit command’s parameter kfold_cv
Use a Classification Model:
© 2019 SPLUNK INC.
> Explore the Classification > Put your Algorithm > Optionally find a way to deal
Assistant into Practice with model overfitting
Explore the Classification Assistant
© 2019 SPLUNK INC.
3
Option 1 – Create New Experiment (Predict Categorical Fields)
No need to add the y_vehicleType as a preprocess field, select only the x_* fields.
Now run the same query again using SVM, use the SS_* fields for predicting and you should see much better results!
Alternatively, you can use another algorithm, like ‘RandomForestClassifier’.
Save your Classification Model
© 2019 SPLUNK INC.
> Explore the Clustering > Cluster Analysis of the > Optionally try and detect
Assistant mytrackdata-Dataset outliers
Explore the Cluster Showcases
© 2019 SPLUNK INC.
4
Example:
<your search> | fit PCA k=<int> <fields>
| inputlookup mytrackdata.csv
| fit Imputer x_engineCoolantTemperature strategy="median"
| rename Imputed_* as *
| apply car_clustering_StandardScaler_0
| apply car_clustering
| table c* y_* SS_* *
| fit PCA k=3 SS_*
| rename y_vehicleType as clusterId, PC_1 as x, PC_2 as y, PC_3 as z