You are on page 1of 21

Detect anomalies in log data using

Amazon Elasticsearch Service

Kapil Pendse
Sr. Solutions Architect, Amazon Web Services

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Agenda

• Log Analytics and Elasticsearch


• Machine Learning for Log Analysis
• Anomaly Detection in Amazon Elasticsearch Service
• Demo
• Reference Architectures
• Getting Started
• Things to Remember

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Definition

Log analytics involves searching, analyzing, and visualizing machine


data generated by your IT systems and technology infrastructure to
gain operational insights.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Where does this data come from?

IoT & Mobile IT & DevOps Applications & Cloud

• Automotive • Databases • Access tracking


• Home • Load balancers • Environment change
• Tools • Networking • Web applications
• Manufacturing • Deployment tools • Business applications
• Mobile applications • Servers • Container frameworks

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Actionable insights come from proper tools

• Most databases can’t scale horizontally and have finite resources

• Data warehouses can scale horizontally but suffer due to lack of indexes

• Manual interrogation of text files bottleneck at the user

Traditional data analytics tools are simply not built to handle the variety and volume of
rapidly proliferating machine data.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Elasticsearch for turning logs into insights

Elasticsearch stores, and indexes application and log data in near real time, providing fast
retrieval, filtering, and analysis for monitoring. It's also great for search!

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Simple to use – it’s a database

1 2 3
Send data as Data is indexed— REST APIs for field matching,
JSON via REST APIs all fields searchable, Boolean expressions, sorting
including nested JSON and analysis

1 3
Server, application, Application data
network, AWS, and
Elasticsearch cluster Application users, analysts,
other logs
DevOps, security

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Amazon Elasticsearch Service Infrastructure
AWS Cloud - region

Amazon ES domain

VPC

AWS Identity and Application Load


Access Balancing (ALB)
Management (IAM)
Data nodes UltraWarm nodes Master nodes

Amazon AWS Amazon Kinesis AWS Database Amazon


CloudWatch CloudTrail Data Firehose Migration Service CloudWatch Logs

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Machine Learning for Logs

• While logs help you when you need to


investigate a past incident, machine learning
makes it possible to proactively monitor
those logs to detect anomalies

• Amazon Elasticsearch Service now offers


anomaly detection, which uses machine
learning to detect anomalies on real-time
streaming data and identifies issues as they
evolve so you can mitigate them
immediately.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Use cases

VPC Flow Logs


• Increase in error rates

• Sudden change in user geography

S3 Access Logs • Unusual amount of data download

• Increased access at odd time of day

• Increase in network level traffic rejections


Application Logs

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Anomaly Detection in Amazon Elasticsearch Service

• Learns what is normal


• Doesn’t require prior
knowledge
• Deviations from normal are
anomalies
• Works in near real time on
time series data
• Uses the Random Cut Forest
(RCF) algorithm

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Anomaly Detection in Amazon Elasticsearch Service

A detector is an individual anomaly detection task

Detector Interval (DI) = 4


Window Delay (WD) = 2

WD DI DI DI DI DI

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Time

Now (T)

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Anomaly Detection in Amazon Elasticsearch Service

A feature is a field in your index that you check for anomalies.

Aggregation method: sum()

RCF algorithm

sum sum sum sum sum

WD DI DI DI DI DI

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Time

Now (T)

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Anomaly Detection in Amazon Elasticsearch Service

An anomaly is any unusual change in behavior

Confidence Score & Anomaly Grade

RCF algorithm

sum sum sum sum sum

WD DI DI DI DI DI

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Time

Now (T)

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Demo

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Reference Architecture – S3 Access Logs

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Reference Architecture – Application Logs

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Reference Architecture – VPC Flow Logs

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Getting Started

• Create or upgrade to Amazon Elasticsearch Service domain version 7.4

• Start ingesting log data into an index. As part of the ingestion pipeline, transform non-numeric
fields to numeric fields if needed. For example, one-hot encoding can be used for things like
HTTP response codes, error codes, exceptions etc.

• Configure and start running a detector

• Configure a monitor using the Amazon Elasticsearch Service Alerting feature, so that you can
receive alerts via Slack or SNS.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Things to Remember
• Amazon Elasticsearch Service Anomaly Detection only supports continuous numerical features,
categorical features are not supported

• The values of Detector Interval and Window Delay have to be carefully configured. Sometimes it
can take several hours for a detector to finish initializing. If your detector stays in “initializing”
state for longer than a day, you can use the “profile detector” API to check if there are any issues
that are affecting your detector.

• Anomaly Detection can be computationally intensive, so monitor the CPU utilization of your
cluster nodes and try to use larger instances if CPU utilization becomes a bottleneck.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Thank you!
kapilpen@amazon.com

@kapilpendse

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.

You might also like