Professional Documents
Culture Documents
Splunk 4 Ninjas - ML: Hands On Intro To Splunk Machine Learning Toolkit
Splunk 4 Ninjas - ML: Hands On Intro To Splunk Machine Learning Toolkit
Splunk 4 Ninjas - ML
Hands on Intro to Splunk Machine Learning Toolkit
14 October 2020
© 2019 SPLUNK INC.
Forward- During the course of this presentation, we may make forward-looking statements regarding
future events or plans of the company. We caution you that such statements reflect our
Looking current expectations and estimates based on factors currently known to us and that actual
events or results may differ materially. The forward-looking statements made in the this
Statements presentation are being made as of the time and date of its live presentation. If reviewed after
its live presentation, it may not contain current or accurate information. We do not assume
any obligation to update any forward-looking statements made herein.
In addition, any information about our roadmap outlines our general product direction and is
subject to change at any time without notice. It is for informational purposes only, and shall
not be incorporated into any contract or other commitment. Splunk undertakes no obligation
either to develop the features or functionalities described or to include any such feature or
functionality in a future release.
Splunk, Splunk>, Turn Data Into Doing, The Engine for Machine Data, Splunk Cloud, Splunk
Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States
and other countries. All other brand names, product names, or trademarks belong to their
respective owners. © 2019 Splunk Inc. All rights reserved.
© 2019 SPLUNK INC.
• Welcome / Introduction
• Intro Machine Learning @ Splunk
• Demo Machine Learning Toolkit with Q&A
• Intro to the Trackday Dataset
• Four Different Challenges (~ 30min each)
• Challenge 1
Agenda
– Explore the track_day.csv Dataset
• Challenge 2
– Detect Numeric Outliers
• Challenge 3
– Supervised Learning: Predict Categorical Fields
• Challenge 4
– Unsupervised Learning: Clustering
Disclaimer
What this session is not about and what it is about
► Deviation from past behavior ► Predict Service Health Score/Churn ► Identify peer groups
► Deviation from peers ► Predicting Events ► Event Correlation
► (aka Multivariate AD or Cohesive AD) ► Trend Forecasting ► Reduce alert noise
► Unusual change in features ► Detecting influencing entities ► Behavioral Analytics
► ITSI Metric Anomaly Detection ► Early warning of failure ► ITSI Event Analytics
CORE PLATFORM
SEARCH PACKAGED PREMIUM MACHINE LEARNING
SOLUTIONS TOOLKIT
ML capabilities. Expertise
(IT, Security…)
ITSI Splunk ML Toolkit
UBA facilitates and simplifies
via examples & guidance
MLTK
Data
Splunk
• Science
Searching
Expertise
• Reporting Expertise • Statistics/math background
• Alerting • Algorithm selection
• Workflow • Model building
“Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes Mar 23, 2016
Collect Clean & Search & Pre-processing Choose Build, Test, Operationalize Visualize &
Data Munge Explore Feature Selection Algorithm Improve Models Monitor Alert Share
Industrial Data
SCADA, AMI, Meter Reads
Industrial Assets
Native Inputs
Search Alert Visualize Predict Develop
TCP, UDP, Logs, Scripts, Wire, Mobile
Consumer and
Mobile Devices Modular Inputs
Real Time
MQTT, AMQP, COAP, REST, JMS
External
OT HTTP Event Collector
Lookups/Enrichment
Token Authenticated Events
Asset Maintenance Data
Info Info Stores
Technology Partnerships
IT Kepware, AWS IoT, Cisco, Palo Alto
Send an
Industrial Assets
email
Email
File a
Consumer and ticket
Mobile Devices Real Time Search Alert Tickets
Trigger
process flow
Third-Party
OT Applications
Send a text
Smartphones
IT and Devices
…
OT
Visualize
IT
CORE PLATFORM
SEARCH
PACKAGED PREMIUM
SOLUTIONS TOOLKIT
+
MACHINE LEARNING DEEP LEARNING
TOOLKIT
…
OT
Visualize
IT
Demo:
Machine Learning Toolkit
© 2019 SPLUNK INC.
> Follow along for the labs using the ‘handrail’ guide:
https://bit.ly/Splunk_ML_Lab
> Username: admin
> Password: changeme
> Quick reference guide for future reference, if you want to use the MLTK on
your own: https://bit.ly/MLTK_guide
© 2019 SPLUNK INC.
Hands-on Challenges
© 2019 SPLUNK INC.
Simple concept
Workshop Goals
1
2. Change to the
Search Tab
1
Eliminate unwanted
Fields with
| fields - values
?
What’s going on
with the engine
coolant
temperature?
Explore your Dataset with Visualizations
© 2019 SPLUNK INC.
1
Using Splunk MLTK’s Histogram Macro
© 2019 SPLUNK INC.
OR
1
| stats count by x_batteryVoltage
x_batteryVoltage count
| chart count over x_batteryVoltage by y_vehicleType
13.16-13.17 3 13 0 0 0 0 1
13.46-13.47 1 14 0 0 0 1 1
15 1 0 1 0 0
16 1 1 1 0 0
17 1 0 0 0 1
Working with the Boxplot Macro
© 2019 SPLUNK INC.
1
? How can this query be improved?
Hints:
> Scale numeric values using
the fit command with the
StandardScaler
Explore the Dataset with Box Plots
© 2019 SPLUNK INC.
15 minute break
© 2019 SPLUNK INC.
> Explore the Outlier > Start your own Outlier > Optionally try to compare
Detection Showcases Detection Experiment different outlier detection
approaches
Explore the Outlier Detection Showcases
© 2019 SPLUNK INC.
> Switch to the Experiments tab of the MLTK and create a new experiment
> Instead of an approach based on statistics
we are now going to use the density function to detect outliers
Create Your Own Smart Outlier Experiment
© 2019 SPLUNK INC.
2
Click here to get to the
next step
> The fit command produces a machine learning model based on the
behaviour of a set of events. It applies the model to the current search
results in the search pipeline
> The apply command applies the machine learning model
that was learned using the fit command
© 2019 SPLUNK INC.
> Explore the Classification > Put your Algorithm into > Optionally find a way to deal
Assistant Practice with model overfitting
Explore the Classification Assistant
© 2019 SPLUNK INC.
3
Option 1 – Create New Experiment
Now run the same query again using SVM, use the SS_* fields for predicting and you should see much better results!
Alternatively, you can use the ‘RandomForestClassifier’ algorithm.
Save your Classification Model
© 2019 SPLUNK INC.
3
3 Which Car Gets Classified Worst?
© 2019 SPLUNK INC.
3
? How can you find out where your model is off?
© 2019 SPLUNK INC.
> Explore the Clustering > Cluster Analysis of the > Optionally try and detect
Assistant mytrackdata-Dataset outliers
Explore the Cluster Showcases
© 2019 SPLUNK INC.
| inputlookup mytrackdata.csv
| fit Imputer x_engineCoolantTemperature strategy="median"
| rename Imputed_* as *
| apply car_clustering_StandardScaler_0
| apply car_clustering
| table c* y_* SS_* *
| fit PCA k=3 SS_*
| rename y_vehicleType as clusterId, PC_1 as x, PC_2 as y, PC_3 as z
© 2019 SPLUNK INC.
Wrap Up
© 2019 SPLUNK INC.
Wrap Up
• Don’t boil the ocean: start small or modify existing showcase examples for
some quick wins.
• Docs are your friend: in case you need help, the documentation is pretty
comprehensive. Also conf.splunk.com has > 100+ sessions on ML.
Thank You
© 2019 SPLUNK INC.
Additional Information
Login:
► Username: admin
► PW: changeme
We created a dashboard for each challenge with example solutions in the hidden
app “Splunk 4 Ninjas Machine Learning”. Use this app for preparation, debriefing
after the challenges or as assistance for unexperienced attendees.
► http://{your-host}:8000/en-GB/app/s4n_ml/splunk_4_ninjas_ml
► or click button next to “Splunk 4 Ninjas Machine Learning” on top of Home Dashboard