You are on page 1of 54

Hitachi Infrastructure Analytics Advisor

Data Analytics and Performance


Monitoring
Overview

MK-96HIAA004-00
May 2016
© 2016 Hitachi, Ltd. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including copying and recording, or stored in a database or retrieval system for
commercial purposes without the express written permission of Hitachi, Ltd., or Hitachi Data Systems
Corporation (collectively “Hitachi”). Licensee may make copies of the Materials provided that any such
copy is: (i) created as an essential step in utilization of the Software as licensed and is used in no
other manner; or (ii) used for archival purposes. Licensee may not make any other copies of the
Materials. “Materials” mean text, data, photographs, graphics, audio, video and documents.
Hitachi reserves the right to make changes to this Material at any time without notice and assumes
no responsibility for its use. The Materials contain the most current information available at the time
of publication.
Some of the features described in the Materials might not be currently available. Refer to the most
recent product announcement for information about feature and product availability, or contact
Hitachi Data Systems Corporation at https://support.hds.com/en_us/contact-us.html.
Notice: Hitachi products and services can be ordered only under the terms and conditions of the
applicable Hitachi agreements. The use of Hitachi products is governed by the terms of your
agreements with Hitachi Data Systems Corporation.
By using this software, you agree that you are responsible for:
1. Acquiring the relevant consents as may be required under local privacy laws or otherwise from
authorized employees and other individuals to access relevant data; and
2. Verifying that data continues to be held, retrieved, deleted, or otherwise processed in
accordance with relevant laws.

Notice on Export Controls. The technical data and technology inherent in this Document may be
subject to U.S. export control laws, including the U.S. Export Administration Act and its associated
regulations, and may be subject to export or import regulations in other countries. Reader agrees to
comply strictly with all such regulations and acknowledges that Reader has the responsibility to obtain
licenses to export, re-export, or import the Document and any Compliant Products.
Hitachi is a registered trademark of Hitachi, Ltd., in the United States and other countries.
AIX, AS/400e, DB2, Domino, DS6000, DS8000, Enterprise Storage Server, eServer, FICON,
FlashCopy, IBM, Lotus, MVS, OS/390, PowerPC, RS/6000, S/390, System z9, System z10, Tivoli,
z/OS, z9, z10, z13, z/VM, and z/VSE are registered trademarks or trademarks of International
Business Machines Corporation.
Active Directory, ActiveX, Bing, Excel, Hyper-V, Internet Explorer, the Internet Explorer logo,
Microsoft, the Microsoft Corporate Logo, MS-DOS, Outlook, PowerPoint, SharePoint, Silverlight,
SmartScreen, SQL Server, Visual Basic, Visual C++, Visual Studio, Windows, the Windows logo,
Windows Azure, Windows PowerShell, Windows Server, the Windows start button, and Windows Vista
are registered trademarks or trademarks of Microsoft Corporation. Microsoft product screen shots are
reprinted with permission from Microsoft Corporation.
All other trademarks, service marks, and company names in this document or website are properties
of their respective owners.

2
HIAA Data Analytics and Performance Monitoring Overview
Contents
Preface................................................................................................. 5
Intended audience................................................................................................... 6
Product version........................................................................................................6
Related documents.................................................................................................. 6
Document conventions............................................................................................. 6
Conventions for storage capacity values.....................................................................7
Accessing product documentation............................................................................. 8
Getting help.............................................................................................................8
Comments...............................................................................................................8

1 Introduction..................................................................................... 9
Product overview................................................................................................... 10
Key features.......................................................................................................... 11
Unified infrastructure monitoring dashboard....................................................... 11
Advanced reporting...........................................................................................12
SLO management............................................................................................. 12
System and resource events.............................................................................. 12
End-to-end monitoring...................................................................................... 13
Problem identification and root cause analysis.................................................... 14
Logging on to Infrastructure Analytics Advisor .........................................................14
Accessing Data Center Analytics.............................................................................. 15

2 Performance monitoring using advanced threshold settings................ 17


Threshold profiles.................................................................................................. 18
Advanced threshold settings................................................................................... 19
Determining the threshold type for your environment............................................... 19
Dynamic thresholds................................................................................................19
Advantages of dynamic thresholds..................................................................... 19
Determining if the computed value is correct...................................................... 20
Automatic calculation of baseline values............................................................. 20
Setting dynamic thresholds using monitoring profiles...........................................22
Static thresholds.................................................................................................... 25
Setting static thresholds using monitoring profiles............................................... 26
For system resources...................................................................................27

3
HIAA Data Analytics and Performance Monitoring Overview
3 End-to-end performance troubleshooting.......................................... 29
Identifying performance problems........................................................................... 30
Infrastructure components and key performance metrics.......................................... 30
Troubleshooting high response times....................................................................... 32
Troubleshooting workflow....................................................................................... 32
Detecting performance problems.............................................................................33
Analyzing performance bottleneck........................................................................... 34
Analyzing in E2E view....................................................................................... 34
Analyzing in Verify Bottleneck window................................................................ 35
Analyzing in Sparkline view................................................................................36
Analyzing in Detail view.....................................................................................37
Analyzing the root cause of the bottleneck............................................................... 38
Identify affected resources................................................................................ 38
Analyze shared resources.................................................................................. 38
Analyze related changes....................................................................................39
Solving performance problems................................................................................ 41

4 Flexible reporting and analysis using Data Center Analytics................ 43

5 Monitoring and quick troubleshooting with Data Center Analytics........47

6 Strategic planning using trend analysis in Data Center Analytics......... 51

4
HIAA Data Analytics and Performance Monitoring Overview
Preface
This preface includes the following information:

□ Intended audience

□ Product version

□ Related documents

□ Document conventions

□ Conventions for storage capacity values

□ Accessing product documentation

□ Getting help

□ Comments

Preface 5
HIAA Data Analytics and Performance Monitoring Overview
Intended audience
This document provides an overview of the Hitachi Infrastructure Analytics
Advisor software. This document is intended for storage administrators and
infrastructure administrators.

Product version
This document revision applies to Infrastructure Analytics Advisor 2.0 or later.

Related documents
The following documents are referenced or contain more information about
the features described in this manual.

• Hitachi Infrastructure Analytics Advisor User Guide, MK-96HIAA001


• Hitachi Infrastructure Analytics Advisor REST API Reference Guide,
MK-96HIAA003
• Hitachi Infrastructure Analytics Advisor Data Center Analytics User Guide,
MK-96HIAA005
• Hitachi Data Center Analytics REST API Reference Guide, MK-96HDCA006
• Hitachi Data Center Analytics Query Language User Guide, MK-96HDCA005

Document conventions
This document uses the following typographic conventions:

Convention Description

Bold • Indicates text in a window, including window titles, menus, menu options,
buttons, fields, and labels. Example:
Click OK.
• Indicates emphasized words in list items.
Italic • Indicates a document title or emphasized words in text.
• Indicates a variable, which is a placeholder for actual text provided by the
user or for output by the system. Example:
pairdisplay -g group

(For exceptions to this convention for variables, see the entry for angle
brackets.)
Monospace Indicates text that is displayed on screen or entered by the user. Example:
pairdisplay -g oradb

< > angle brackets Indicates variables in the following scenarios:


• Variables are not clearly separated from the surrounding text or from
other variables. Example:
Status-<report-name><file-version>.csv

6 Preface
HIAA Data Analytics and Performance Monitoring Overview
Convention Description

• Variables in headings.
[ ] square brackets Indicates optional values. Example: [ a | b ] indicates that you can choose a,
b, or nothing.
{ } braces Indicates required or expected values. Example: { a | b } indicates that you
must choose either a or b.
| vertical bar Indicates that you have a choice between two or more options or arguments.
Examples:

[ a | b ] indicates that you can choose a, b, or nothing.

{ a | b } indicates that you must choose either a or b.

This document uses the following icons to draw attention to information:

Icon Label Description

Note Calls attention to important or additional information.

Tip Provides helpful information, guidelines, or suggestions for performing


tasks more effectively.

Caution Warns the user of adverse conditions and/or consequences (for


example, disruptive operations, data loss, or a system crash).

WARNING Warns the user of a hazardous situation which, if not avoided, could
result in death or serious injury.

Conventions for storage capacity values


Physical storage capacity values (for example, disk drive capacity) are
calculated based on the following values:

Physical capacity unit Value

1 kilobyte (KB) 1,000 (10 3) bytes


1 megabyte (MB) 1,000 KB or 1,0002 bytes
1 gigabyte (GB) 1,000 MB or 1,0003 bytes
1 terabyte (TB) 1,000 GB or 1,0004 bytes
1 petabyte (PB) 1,000 TB or 1,0005 bytes
1 exabyte (EB) 1,000 PB or 1,0006 bytes

Logical storage capacity values (for example, logical device capacity) are
calculated based on the following values:

Preface 7
HIAA Data Analytics and Performance Monitoring Overview
Logical capacity unit Value

1 block 512 bytes


1 cylinder Mainframe: 870 KB

Open-systems:
• OPEN-V: 960 KB
• Others: 720 KB
1 KB 1,024 (210) bytes
1 MB 1,024 KB or 1,0242 bytes
1 GB 1,024 MB or 1,0243 bytes
1 TB 1,024 GB or 1,0244 bytes
1 PB 1,024 TB or 1,0245 bytes
1 EB 1,024 PB or 1,0246 bytes

Accessing product documentation


Product user documentation is available on Hitachi Data Systems Support
Connect: https://support.hds.com/en_us/documents.html. Check this site for
the most current documentation, including important updates that may have
been made after the release of the product.

Getting help
Hitachi Data Systems Support Connect is the destination for technical support
of products and solutions sold by Hitachi Data Systems. To contact technical
support, log on to Hitachi Data Systems Support Connect for contact
information: https://support.hds.com/en_us/contact-us.html.

Hitachi Data Systems Community is a global online community for HDS


customers, partners, independent software vendors, employees, and
prospects. It is the destination to get answers, discover insights, and make
connections. Join the conversation today! Go to community.hds.com,
register, and complete your profile.

Comments
Please send us your comments on this document to doc.comments@hds.com.
Include the document title and number, including the revision level (for
example, -07), and refer to specific sections and paragraphs whenever
possible. All comments become the property of Hitachi Data Systems
Corporation.

Thank you!

8 Preface
HIAA Data Analytics and Performance Monitoring Overview
1
Introduction
This module introduces Infrastructure Analytics Advisor.

□ Product overview

□ Key features

□ Logging on to Infrastructure Analytics Advisor

□ Accessing Data Center Analytics

Introduction 9
HIAA Data Analytics and Performance Monitoring Overview
Product overview
With Infrastructure Analytics Advisor, you can define and monitor storage
service level objectives (SLOs) for resource performance. You can identify
and analyze historical performance trends to optimize storage system
performance and plan for capacity growth.

Using Infrastructure Analytics Advisor, you register resources (storage


systems, hosts, servers, and volumes) and set service-level thresholds. You
are alerted to threshold violations and possible performance problems
(bottlenecks). Using analytics tools, you find which resource has a problem
and analyze its cause to help solve the problem.

The following figure describes how the Infrastructure Analytics Advisor


ensures the performance of your storage environment based on real-time
service level objectives (SLOs).

The system administrator uses Hitachi Infrastructure Analytics Advisor (HIAA)


to manage and monitor the IT infrastructure based on SLOs, which match the
service-implementation guidelines that are negotiated under a service level
agreement (SLA) with consumers.

Infrastructure Analytics Advisor monitors the health of the IT infrastructure


using performance indicators and generates alerts when SLOs are at risk.

Having data center expertise, the service administrator uses Infrastructure


Analytics Advisor to assign resources, such as VMs and storage capacity from
registered storage systems, to consumer applications. The purpose of doing
this is to manage critical SLO violations and to ensure that service
performance meets the service level agreements.

10 Introduction
HIAA Data Analytics and Performance Monitoring Overview
Key features
The key features of Infrastructure Analytics Advisor are described in this
section.

Unified infrastructure monitoring dashboard


Hitachi Infrastructure Analytics Advisor dashboards are visual representations
of the performance metrics of your infrastructure resources. The consolidated
view allows you to quickly interpret the performance metrics and identify
performance problems.

The consolidated dashboard view allows for the unified management of the
server, storage, and network infrastructure resources. You can ensure the
health of your data center by proactively monitoring the consumer groups,
storage components, volumes, VMs, servers, and network devices. The
advanced visual analytics aids in visualizing the performance data in easy-to-
use graphs and charts. The visual cues allow for intuitive performance
management.

The functions of the Infrastructure Analytics Advisor dashboard are as


follows:
• Displays performance metrics summaries for the monitored resources.
• Displays warnings and critical alerts that need immediate action.
• Displays performance trends.
• Drill down from summary reports to detailed reports.
• Ability to navigate to the E2E topology view for detailed analysis.

Introduction 11
HIAA Data Analytics and Performance Monitoring Overview
Advanced reporting
Infrastructure Analytics Advisor reporting capabilities enable you to monitor
the infrastructure resources and assess their current performance, capacity
and utilization. Reporting data provides you the information you need to
make informed business decisions and plan for future growth.

Infrastructure Analytics Advisor supports both standard and custom reporting


capabilities.

Standard reports:
• Default reports. The first time you log on to Infrastructure Analytics
Advisor, the Dashboard shows the following reports by default: System
Status Summary, Event Trends, System Resource Status , and Resource
Events. You can customize which reports display by default.
• Critical reports. Critical reports show resources in your storage
infrastructure that exceeded their thresholds. Critical reports are available
for consumers, VMs, volumes, hosts, and system resources.
• Summary reports. Summary reports give you a high-level view of storage
infrastructure resources. These reports are available for consumers, VMs,
volumes, and system resources. Each summary report shows the number
of resources with critical and warning alerts.
• Other reports. Infrastructure Analytics Advisor provides additional reports
about hypervisors, switches, and system and resource events.

Custom reports:

By integrating with Data Center Analytics, you can create custom reports by
running queries on performance data that is collected from monitored
resources. You can also create real-time and historical reports that are
specific to your business needs.

SLO management
SLOs are measurable parameters which are defined for monitoring the
performance of user resources. With Infrastructure Analytics Advisor you can
evaluate, define, and customize the service level objectives defined for the
monitored resources such as volumes and VMs. By monitoring the SLOs you
can determine whether your objectives comply with your business
requirements.

Infrastructure Analytics Advisor offers the capability to establish and monitor


storage service level objectives for business-critical applications and logical
storage devices. When a service level threshold is exceeded, integrated
diagnostic aids facilitate in identifying the root cause.

System and resource events


You can view the latest events in one place and manage the events based on
the status.

12 Introduction
HIAA Data Analytics and Performance Monitoring Overview
The Events tab allows you to display details about significant events in your
monitored environment.

There are two categories of events:


• Resource events
The Resource Events tab displays Performance events, which are
generated when a device or component (server, storage system, network
device, and so on) does not perform optimally.
You can analyze the Resource events by using the end-to-end network
topology view to identify the resource that generated the event.
• System events
The System Events tab displays Management and Event Action events,
which are generated when system settings must be verified or configured.

The All Events tab displays both Resource events and System events, and
you are only able to display the end-to-end network topology view when you
select a Performance event.

Each event indicates the level of the alert, the date and time of the alert
message, category, device name, and component name. Click a message in
the Message column to display the Event Detail window.

Use the Event Detail window to display more event details, such as the
device type and component type. You can click Up and Down to scroll through
more events. If the event is a Resource event, you can click Show E2E View
to view the network topology.

Event levels classifications are as follows:


• Critical - Critical event that requires your immediate attention
• Warning - Event that is not critical yet, but might become critical in the
future
• Informational - No immediate action is required

End-to-end monitoring
The E2E topology view provides detailed configuration of the infrastructure
resources and lets you view the relationship between the infrastructure

Introduction 13
HIAA Data Analytics and Performance Monitoring Overview
components. You can manually analyze the dependencies between the
components in your environment and identify the resource causing
performance problems. By using the topology maps, you can easily monitor
and manage your resources. You can use this view to monitor resources in
your data center from applications, virtual machines, server, network to
storage.

In the E2E view, each node represents a resource and the connecting links
represent the relationship between the infrastructure components. You can
analyze a resource which is the target of analysis and all the associated
resources. You can also view the alerts associated with all the related
resources and trace the problem at the root level. The node based E2E view
helps you analyze the problem on the affected node and its impact on the
rest of the infrastructure resources.

Problem identification and root cause analysis


The performance problems might occur because of varying system loads,
applications updates, capacity upgrades, configuration changes and inefficient
management of resources in shared infrastructure.

Infrastructure Analytics Advisor advanced diagnostic engine aids in rapidly


diagnosing, troubleshooting and finding the root cause of performance
bottlenecks.

Logging on to Infrastructure Analytics Advisor


Access the Infrastructure Analytics Advisor web interface from a supported
browser.

Procedure

1. Open a web browser.


2. Enter the URL for Infrastructure Analytics Advisor in the address bar:
http://ip-address:port-number/Analytics/login.htm

where:
• ip-address is the IP address of the Infrastructure Analytics Advisor
management server.
• port-number is the port number of the Infrastructure Analytics Advisor
management server. The default port number is 22015.
To access Infrastructure Analytics Advisor in secure mode, enter:
https://ip-address:port-number/Analytics/login.htm
The default port number for secure mode is 22016.
3. Type a user ID and password to log on.
4. Click Log In.

14 Introduction
HIAA Data Analytics and Performance Monitoring Overview
Accessing Data Center Analytics
Use Data Center Analytics to conduct historical trend analysis across a wide
set of infrastructure statistics, create advanced monitoring custom reports,
and interactively do additional troubleshooting and diagnostics.

Access Data Center Analytics from the Tools menu. Type a user ID and
password to log on.

Use the Data Center Analytics online help to view details about reporting
tasks and features.

Introduction 15
HIAA Data Analytics and Performance Monitoring Overview
16 Introduction
HIAA Data Analytics and Performance Monitoring Overview
2
Performance monitoring using
advanced threshold settings
Infrastructure Analytics Advisor ensures health of your data center by
measuring, monitoring, and optimizing the performance of your infrastructure
resources.

□ Threshold profiles

□ Advanced threshold settings

□ Determining the threshold type for your environment

□ Dynamic thresholds

□ Static thresholds

Performance monitoring using advanced threshold settings 17


HIAA Data Analytics and Performance Monitoring Overview
Threshold profiles
You can define the monitoring parameters for target resources in the
threshold profiles. Monitoring parameters vary depending on the type of
resources being monitored.

The profile details page contains information about the profile name,
description, and if the profile uses the preset parameters defined in the
monitoring template.

Two types of threshold profiles are available for monitoring purposes:


• User Resource Threshold Profiles
You can define monitoring parameters for user resources such as volumes,
and VMs using User Resource Threshold profiles.
You can perform the following tasks for monitoring user resources:
• Monitoring plans: You can create plans for monitoring performance of
resources whose workloads vary at different times of the day or week.
For example, you can create separate monitoring plans for managing
varying workloads that occur at different time periods, such as
weekdays and weekends, peak workload periods and off workload
periods, and so on.
• Threshold settings: You can configure the threshold settings for user
resources. The threshold settings determine when an alert should be
triggered. You can monitor user resources using dynamic or static
thresholds.
• Automated resource assignment: You can create rules and conditions to
automate resource assignment to monitoring profiles. Using these rules,
the newly discovered user resources are automatically assigned to the
existing user resource threshold profiles. The resources associated with
a threshold profile are monitored based on the parameters defined in
the profile. You can also manually assign resources to the monitoring
profiles.
• System Resource Threshold Profiles
You can define monitoring parameters for system resources such as
Switches, Hypervisors, and Storage Systems using System Resource
Threshold profiles.
You can perform the following tasks for monitoring system resources:
• Threshold settings: You can configure the threshold settings for system
resources. The threshold settings determine when an alert should be
triggered. You can monitor system resources using static thresholds.
• Manual resource assignment: You can manually assign resources to the
system resource monitoring profiles. The resources associated with a
threshold profile are monitored based on the parameters defined in the
profile.

18 Performance monitoring using advanced threshold settings


HIAA Data Analytics and Performance Monitoring Overview
Advanced threshold settings
Infrastructure Analytics Advisor supports monitoring of the performance
metrics defined for your infrastructure resources using static and dynamic
thresholds.

Determining the threshold type for your environment


Determining the appropriate threshold is essential for monitoring
performance and ensuring compliance.

As a system administrator, you must configure your environment to meet the


SLO requirements. However, over time, system performance can change. To
adapt to these changes while continuing to maintain the SLOs, you must
monitor the system closely and periodically you might be required to change
the performance thresholds.

Two types of performance metric thresholds available for monitoring. The


type you choose depends on various factors, such as monitored environment,
monitored resources, business objectives and others.
• Dynamic thresholds: Dynamic thresholds are system computed values,
which keeps evolving depending on the performance of your system. The
system analyzes the historical performance trends and computes an
appropriate baseline value.
• Static thresholds: Static threshold values are user-defined static values
that are used for monitoring a system with a predictable performance
pattern.

Dynamic thresholds
Dynamic thresholds are calculated automatically by analyzing the load
pattern from the historical data. These values are adaptive in nature and
changes over a period of time depending on the performance of your
resources, workload changes and so on. You can monitor only the user
resources, such as volumes, VMs, and hosts by using dynamic thresholds.

The scenarios when you would use dynamic threshold values for monitoring
your environment are as follows:
• When SLOs and other performance parameters are not established with
the customer

• When you want to monitor your environment for stable performance and
detect irregular behavior

Advantages of dynamic thresholds

Performance monitoring using advanced threshold settings 19


HIAA Data Analytics and Performance Monitoring Overview
With changing business requirements and performance goals, monitoring
performance of your environment using predefined static thresholds might
not be a feasible solution. The static values are calculated through trial and
error, which is often time-consuming. These values become out of context in
the long-term and the settings must be re-evaluated to ensure compliance.

Manually altering the thresholds each time there is a change in the system
dynamics is a futile effort. By automating the threshold setting you gain
better visibility into your environment and performance trend patterns.
Dynamic thresholds adapt to your environment and proactively sends alerts
before the performance bottleneck occurs.

Determining if the computed value is correct


If the computed values match your requirements, you can continue to use
the dynamic thresholds for monitoring your environment. If you receive too
many false alerts, you can manually edit the dynamic threshold values. For
example, during migration process, a resource might have a large number of
disk IOs temporarily and you might receive a number of false alerts. In this
situation, you can manually edit the baseline value to account for the
temporary increase in the load, and then allow the system to dynamically
adjust the baseline values when the stable operation is restored.

Automatic calculation of baseline values


Determining an appropriate threshold is essential while monitoring business
critical applications. Infrastructure Analytics Advisor analyzes the peak,
normal, and low volume phases based on the historical data and adjusts the
monitoring thresholds accordingly. Automating the threshold calculation
eliminates false alerts and reduces the number of alerts to investigate which
might otherwise become a management overhead.

The application workloads might vary at different times of the day or week.
For example, the workload pattern of an OLTP application might be different
on weekdays and weekends. You can manage varying workloads that occur at
different time periods for an application by creating monitoring plans. The
system analyzes the performance data accumulated in the scheduled baseline
period for computing the dynamic threshold values.

The following example shows the response time metrics of a business-critical


application monitored over time and how the system derives the automatic
threshold values based on the past performance. The high-level steps the
system uses to calculate the automatic baseline values are as follows:

20 Performance monitoring using advanced threshold settings


HIAA Data Analytics and Performance Monitoring Overview
• Analyzes historical data for identifying the performance patterns in the
specified baseline period.

• Detects and removes the occasional outliers: In the following example, the
data points that deviates from the norm represent the outliers. The system
ignores the outliers appearing at irregular intervals to calculate an
appropriate threshold value.

• Calculates the maximum value: The upper limit of the values in the normal
range is used to calculate the maximum value. After determining the
maximum value, the system adds the margin of error to the computed
value.

Performance monitoring using advanced threshold settings 21


HIAA Data Analytics and Performance Monitoring Overview
• Determines the weighted average: The weighted average derives the
threshold values based on the past performance trends over a specified
time period.

Setting dynamic thresholds using monitoring profiles


You can create monitoring profiles with dynamic thresholds for managing
user resources only. System resources cannot be monitored using dynamic
thresholds.

Using the user resource threshold profile, you can apply dynamic thresholds
across user resources within your environment. For example, using a user
resource threshold profile, you can apply a dynamic threshold setting for all
volumes in an application.

You can create monitoring plans for an OLTP application, whose workloads
vary during weekdays and weekends. You can also create a separate plan for
monitoring batch jobs that run at night. The procedure for enabling dynamic
thresholds is as follows:

Procedure

1. On the Administration tab, from the navigation pane select Monitoring


Settings > User Resource Threshold Profiles > Create Threshold
Profile.
2. In the Create User Resource Threshold Profile window, enter the
profile name, description, select the resource type, and the acceptable
margin of error for sending alerts (Severe, Normal, and Rough).

22 Performance monitoring using advanced threshold settings


HIAA Data Analytics and Performance Monitoring Overview
3. Under Monitoring Plans, click Create Plan to create new monitoring
plans. You can either edit the base plan, or create a new plan.
4. In the Create Plan window, enter the plan name, and set the target
period. Under Target metric, you will see a list of performance metrics
related to the selected resource. Click Dynamic to enable dynamic
monitoring mode and click OK.

Performance monitoring using advanced threshold settings 23


HIAA Data Analytics and Performance Monitoring Overview
5. To save the profile, click OK.
After you save the profile, you are navigated to the profile detail window,
where you can assign target resources, or create resource assignment
rules.
6. In the profile detail window you can do the following:
• On the Assignment Rules tab, you can create rules for assigning
resources to the monitoring profile automatically.
• On the Target Resources tab, you can assign the resources to the
profile manually. You can also view the existing target resources
associated to the monitoring profile.

24 Performance monitoring using advanced threshold settings


HIAA Data Analytics and Performance Monitoring Overview
Static thresholds
Static thresholds are user-defined thresholds which you can manually
configure for use at different times of the day or week depending on the
workload in your environment.

You can use predefined static threshold values in the following scenarios:
• When you have a well-defined service level objective which clearly
establishes the performance goals.
For example, if you have a service level agreement with the customer to
support online transactions at a response time of less than 1 second for a
business critical application, then you can create a User resource threshold
profile to establish the response time and other performance requirements
for the application and then assign the target resources for monitoring. If
there is a SLO violation, the system sends a critical alert or a warning and
notifies the user before the problem becomes serious. You can also
generate a report that compares the actual response time of the business
critical application to the SLO and see if your objectives are in compliance
and take necessary measures to fix the problem.
• When you can assess the workload patterns in your environment and know
what values to assign
For example, define the threshold for a system resource based on the
architecture of the storage system. If the storage system is VSP G1000,
then the recommended MPB (MP Blade) usage is under 60%.

Performance monitoring using advanced threshold settings 25


HIAA Data Analytics and Performance Monitoring Overview
Setting static thresholds using monitoring profiles
You can create monitoring profiles with static thresholds for managing user
and system resources. The performance parameters defined in the threshold
profile determine when an alert is triggered.

Create threshold profiles for user or system resources based on the resource
type, and then assign the resources you want to monitor.

For user resources

Procedure

1. On the Administration tab, from the navigation pane, select


Monitoring Settings > User Resource Threshold Profiles > Create
Threshold Profile.
2. In the Create User Resource Threshold Profile window, enter the
profile name, description, and select the resource type.
3. On the Monitoring Plans tab, click Create Plan to create new
monitoring plans. You can either edit the base plan or create a new plan.
4. In the Create Plan window, set the target period for monitoring. Under
Target metric, click Static to enable static monitoring mode. You must
manually enter the threshold values for the target metrics when you
enable static monitoring mode.

5. To save the profile, click OK.


After you save the profile, you are navigated to the profile detail window,
where you can assign target resources, or create resource assignment
rules.
6. In the profile detail window you can do the following:

26 Performance monitoring using advanced threshold settings


HIAA Data Analytics and Performance Monitoring Overview
• On the Assignment Rules tab, you can create rules for assigning
resources to the monitoring profile automatically.
• On the Target Resources tab, you can assign the resources to the
profile manually. You can also view the existing target resources
associated to the monitoring profile.

For system resources


The procedure for setting static threshold for system resources is as follows:

Procedure

1. Go to the Administration tab, from the navigation pane select


Monitoring settings > System Resource Threshold Profiles >
Create Threshold Profile.
2. In the Create System Resource Threshold Profile window, enter the
profile name, description, and select the resource type. If required, copy
the settings from the default profile or existing system resource profiles.
3. Under threshold values, manually enter the threshold values for the
performance metrics.

Performance monitoring using advanced threshold settings 27


HIAA Data Analytics and Performance Monitoring Overview
4. Under Target Resources, click Add Resources to manually assign
resources to the system resource threshold profile.

28 Performance monitoring using advanced threshold settings


HIAA Data Analytics and Performance Monitoring Overview
3
End-to-end performance
troubleshooting
Infrastructure Analytics Advisor provides analytical diagnostics to quickly
identify, isolate, and determine the root cause of problems.

The traditional approach of troubleshooting performance problems in the


unified infrastructure poses several challenges. For example, it can be difficult
to identify performance problem in a storage infrastructure environment that
includes various virtual machines, servers, network, and storage.

Infrastructure Analytics Advisor offers an out-of-the-box analytics solution


which lets you identify and troubleshoot performance problems at the node
level. The topology view lets you view the graphical representation of the
infrastructure components and their dependencies, which is crucial for
troubleshooting the infrastructure performance problems. The
troubleshooting aids helps in efficient root cause analysis.

□ Identifying performance problems

□ Infrastructure components and key performance metrics

□ Troubleshooting high response times

□ Troubleshooting workflow

□ Detecting performance problems

□ Analyzing performance bottleneck

□ Analyzing the root cause of the bottleneck

□ Solving performance problems

End-to-end performance troubleshooting 29


HIAA Data Analytics and Performance Monitoring Overview
Identifying performance problems
The IT infrastructure is becoming more complex each day with rapidly
emerging converged infrastructures. Performance problems occur due to
various factors in your environment. Identifying the performance problems
and troubleshooting the problems quickly is crucial.

As part of your performance management strategy, you define performance


goals and criteria for monitoring your environment. The performance
problems occur when these predefined goals are not met. Use Infrastructure
Analytics Advisor advanced analytics and troubleshooting features to quickly
fix problems.

The following indications make you aware of a performance problem in your


environment :
• When an SLO violation occurs
Typically SLAs define the SLOs to evaluate the quality of service. SLO
profiles define the threshold values for the performance parameters which
you use to evaluate the quality of service. When the threshold values are
exceeded an SLO violation occurs.
• When a sharp deviation from the baseline data occurs
When no SLOs are defined for your environment, you can use the baseline
values to evaluate your system performance. The current performance is
compared to the past performance trends and when there is a significant
deviation from the baseline values, Infrastructure Analytics Advisor sends
an alert to notify you of a potential performance problem so you have
enough time to troubleshoot.
• When the customer notifies you of an application performance degradation
and slow down of the infrastructure.

The common causes for performance problems are as follows:


• Increased load in an otherwise stable operating environment
• Inefficient load balancing strategy, which might cause underutilization of
resources
• Changes in the system configuration
• Resource management in a shared infrastructure

Infrastructure components and key performance metrics


You must analyze the key performance metrics relevant to the problem and
the workload being analyzed.

The components and key performance metrics available in Infrastructure


Analytics Advisor for monitoring performance are listed in the following table:

30 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
Key performance metrics

Infrastru Identify the


Performance Identify the
cture resources with SLO Identify the resources
problem related resources
group violations or that might be the root
used by the
performance cause
affected resources
problems

Server CPU contention VM ESX VM


• vCPU Ready • pCPU usage • vCPU Ready
• vCPU usage • Host CPU • vCPU usage
Ready*
ESX
• pCPU usage
• Host CPU Ready*

Memory swap VM ESX VM


• Usage % • Usage % • Usage %
• Active memory • Active memory* • Active memory
• Swap in/out rate • Swap in/out • Swap in/out rate*
rate*
ESX
• Swap in/out rate*

Memory VM ESX VM
contention • Balloon • Usage % • Usage %
• Balloon* • Active memory
VM • Balloon
• Balloon*

Response time ESX


decrement • pCPU usage
• Device Latency
(R/W)*

Storage Response time VM Port VM


decrement • Latency (R/W) • usage • Read (KBps)
• Write (KBps)
Hypervisor Processor • Read Operations
• Latency (T)* • MPB utilization* • Write Operations

LU (Volume) Cache LU (Volume)


• Response time • Write Pending % • IOPS (R/W/T)
(R/W/T) • Side file %

Pool
• Utilization

Parity Group
• Utilization %
• Read Hit %

Network Error packet VM VM


• droppedRx • Transmitted/received
• droppedTx (KBps)
• Transmitted/ • PacketsTx*
received (KBps) • Packets Rx*

* The performance metric is available in Data Center Analytics.

End-to-end performance troubleshooting 31


HIAA Data Analytics and Performance Monitoring Overview
Troubleshooting high response times
The use case flow for troubleshooting high response times for an OLTP
application using advanced analytics and troubleshooting features of
Infrastructure Analytics Advisor is described in this section.
The most significant metric to watch out for while monitoring the online
transactions is the I/O rate. The application will be able to process large
number of transactions when the I/O rates are higher. To maintain good
response times in an OLTP environment which mostly generates random
access I/O, the read I/O response times should be higher. For response time
centric applications, such as OLTP applications, you must maintain low
utilization values to ensure CPU availability and low Q-depth values to ensure
no wait time.

Troubleshooting workflow
The basic workflow for analyzing and troubleshooting the performance
problems using Infrastructure Analytics Advisor is as follows:

1. Detecting performance problems on page 33


2. Analyzing in E2E view on page 34
3. Analyzing in Sparkline view on page 36
4. Identify affected resources on page 38
5. Analyze shared resources on page 38
6. Analyze related changes on page 39
7. Solving performance problems on page 41

32 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
Detecting performance problems
You can view the threshold violations using the Dashboard tab and Events
tab. You can configure the system to send email notifications when the
threshold values are exceeded. You can also use the search feature in the
Analytics tab to find the target resources for performance analysis.

Dashboard
The dashboard displays when you log on to the Infrastructure Analytics
Advisor. You can create a custom dashboard, and choose to view the reports
of monitored resources.

The dashboard displays summary reports for the monitored resources,


system and resource events, event trends and consumer groups. The report
widgets display the threshold violations and critical alerts detected on all
monitored resources when threshold values are exceeded.

In the following figure, the warnings display on the monitored VMs and
volumes. From the report widgets, you can click links to access the E2E view
to analyze the cause of the threshold violations.

Events tab
The Events tab displays a list of resource and system events. You can view
the severity of each event, date and time of the occurrence, category, device

End-to-end performance troubleshooting 33


HIAA Data Analytics and Performance Monitoring Overview
and the component name. You can navigate from the Events tab to the E2E
view for further analysis.

Email notifications
Infrastructure Analytics Advisor allows you to configure email notifications.
When the threshold values are exceeded, the system sends an email to notify
you of the potential performance problem.

Search
The search feature in the Analytics tab lets you search for a resource in the
Consumers, Servers, Storage Systems and Volumes categories. From the
returned search results, you can select the resources you would like to
analyze, and launch the E2E view or Sparkline view for further analysis.

Analyzing performance bottleneck


The performance degradation in the user resources is caused by performance
bottleneck on the server, network, or storage components.

The performance bottleneck occurs due to various reasons, such as CPU


contention, inefficient load balancing, applications sharing storage pools, port
and parity group utilization in shared infrastructure, cache utilization,
changes in dynamic tiering policies, and configuration changes.

You can identify and analyze the component causing the bottleneck in any of
the following views:
• E2E view
• Analyze bottleneck > Verify Bottleneck tab
• Sparkline view
• Detail view

Analyzing in E2E view


In the topology view, if a resource has an alert associated with it, error
indicators display on the resource icons. The color of the indicator
corresponds with the severity of the alert.

34 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
The following shows the E2E configuration related to the affected volumes,
00:00:03, 00:00:05, and 00:00:06:

You can change the base point of analysis to narrow down the topology
associated with the affected volumes. Select the affected volume, right-click,
and then select Change Base Point.

The Parity Group is identified as the component causing the performance


bottleneck.

Analyzing in Verify Bottleneck window


In the E2E view, right-click on a resource icon and then select Verify
Bottleneck to launch the Verify Bottleneck window.

In the Verify Bottleneck window, you can analyze the performance trends of
the potential bottleneck candidate with the base point resources. If the
performance charts display similar trend patterns in the same time period,
you can assume that the selected resource is the bottleneck candidate. If
not, you can repeat the analysis for other resources with alerts in the Verify
Bottleneck window.

In the following example, the Parity Group is identified as the bottleneck


candidate.

End-to-end performance troubleshooting 35


HIAA Data Analytics and Performance Monitoring Overview
Analyzing in Sparkline view
Use Sparkline charts to analyze the performance trends of the monitored
resources. In the Sparkline view you can compare and correlate the
performance of the base point resources and the related infrastructure
resources for identifying the bottleneck.

The Sparkline view displays performance charts for multiple nodes in the
same pane to enable quick comparison between different nodes. You can
display detailed performance metrics for each node and find the correlation
with other nodes.

The following figure shows an example of analyzing the affected volumes


(00:00:03, 00:00:05, and 00:00:06) in the Sparkline view. The trend charts
confirm that the performance bottleneck is caused due to the parity group.

The volumes (00:00:03, 00:00:05, and 00:00:06) belong to the same parity
group. If the volumes (logical resources) share the same parity group
(physical resource) and if one of the logical volumes utilizes the parity group
more than the others in the shared infrastructure, the total efficiency of the
physical resource is degraded and the parity group utilization rate increases.

High Parity Group utilization rate causes delay in reading from, or writing to
disk in the parity group, which increases the response time of the application.
You can consider allocating the affected volumes to a different parity group
for load balancing. You can also check the I/O performance of the parity
group and see if any other servers access the same parity group to
troubleshoot the bottleneck.

36 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
Analyzing in Detail view
In the Sparkline view, you can select multiple graphs and then click Show
Performance to navigate to the Detail view. In the Detail view, you can
closely analyze the performance trends of the base point resources
(00:00:03, 00:00:05, and 00:00:06) and the bottleneck candidate - Parity
Group. Based on the analysis, you will notice that the affected volumes have
similar trend patterns when compared to the parity group during the same
time period, confirming that the parity group is the bottleneck candidate. You
can continue to analyze and find the root cause in the Analyze Bottleneck
window.

End-to-end performance troubleshooting 37


HIAA Data Analytics and Performance Monitoring Overview
Analyzing the root cause of the bottleneck
Infrastructure Analytics Advisor integrated troubleshooting aids provide
guidance about how to find the root-cause of the performance problems.

Identify affected resources


In the Analyze Bottleneck window, click the Identify affected resources tab.
In this window, you can identify the consumers, hosts, VMs and volumes that
use the bottleneck candidate. You can also verify the status of each resource.
Based on the severity level displayed, you can troubleshoot the performance
problems associated with the resources.

Analyze shared resources


The performance problem arises in the shared infrastructure when an
application or a resource uses the majority of the available resources and
causes performance issues for other resources in the shared infrastructure.
Infrastructure Analytics Advisor supports efficient optimization of the shared
infrastructure by quickly identifying the resource contention issues.

In the shared infrastructure, the use of resources by one of the component in


the shared infrastructure negatively impacts the performance of other
components. The main scope of the analysis is to find the resource in the
shared infrastructure which might be causing the performance bottleneck.

Following are the high-level steps used to analyze the root cause in the
Analyze Shared Resource window:

38 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
1. In the Analyze Bottleneck window, click Analyze Shared Resources tab.
2. In the Analyze Shared Resources window, compare the performance
trends of the bottleneck candidate with the related resources to find if
any of these resources are over utilizing the bottleneck candidate.
3. If the performance trends of the compared resources show similar trend
patterns in the same time period, then you can assume that the
performance bottleneck is caused due to the resource contention issues
in the shared infrastructure.

In the following example, the Parity Group is identified as the bottleneck


candidate. In the Analyze Shared Resources window, compare the
performance trends of the Parity Group with the Volumes and VMs that use
the Parity Group. The performance trends of the Parity Group closely match
with the trend patterns of one of the VMs, which leads to the confirmation
that the VM is the resource in the shared infrastructure which is over utilizing
the Parity Group.

You can resolve the bottleneck caused by the shared resources by adopting
efficient load balancing methodologies, which enables optimal utilization of
the resources in the shared infrastructure.

Analyze related changes


The configuration changes can sometimes be the source of the performance
problem in your environment. Infrastructure Analytics Advisor supports the
tracking of infrastructure configuration changes. Analyzing these changes and
correlating them with the performance data lets you determine the effects of
configuration changes on the systems performance and behaviour.

End-to-end performance troubleshooting 39


HIAA Data Analytics and Performance Monitoring Overview
The main scope of the analysis is to examine the configuration changes made
in your environment which might be the root cause of the performance
bottleneck.

Following are the high-level steps used to analyze the root cause in the
Analyze Related Changes window:
1. In the Analyze Bottleneck window, click Analyze Related Changes tab.
2. In the Analyze Related Changes window, a combination chart that
combines the features of the line chart and the bar chart is displayed. In
the combination chart you can compare the performance data of the
bottleneck candidate with the system configuration changes for a
specified time period.
The details of the configuration change events that occurred in the
specified time period is displayed in the lower pane. You can analyze the
change events to see if any of these changes caused performance
variations in the bottleneck candidate. You can also zoom in on the
performance trend chart to select a shorter time period, and view the
change events that occurred in the selected time range.

In the following example, the Parity Group is identified as the bottleneck


candidate. In the Analyze Related Changes window, a combination chart that
contains two data series is displayed, the bars represent the change events
and the line represents the performance of the bottleneck candidate. You can
correlate the performance data of the Parity Group and the change events
that occurred in the specified time period to determine the effects of the
configuration changes. Based on the analysis you can confirm that there were
no configuration change events that caused the performance degradation in
the Parity Group.

40 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
Solving performance problems
The common performance problems and the recommended solutions are
described as follows. The possible causes and solutions are intended to serve
only as guidelines, and might not satisfy business process performance
requirements.

The following table lists the commonly observed storage related problems
and possible solutions:

Bottleneck area Root cause and possible solutions

Parity Group utilization • Root cause


The usage rate of the Parity Group increases
due to the following possible causes:
• Some volumes might be under heavy
load.
• Volumes (Logical resources) might belong
to the same Parity Group (physical
resource) which might cause resource
contention issues in the shared
infrastructure.
• Possible solutions
○ Consider moving some volumes to
another Parity Group with a lower usage
rate or higher performance.
○ Consider increasing the number of drives
(by concatenating Parity Groups).
○ To manage a Parity Group that is part of
a pool, consider adding another Parity
Group to the pool.

MPB utilization • Root cause


The usage rate of the MP Blade (average
usage rate of the MP cores in the MP Blade)
increases due to an increased load. Too many
busy resources such as, internal volumes,
external volumes, or journal groups
accessing the same MP Blade might cause
performance degradation.
• Possible solutions
Consider allocating the busy resources
(internal volumes, external volumes, or
journal groups) to another MP Blade
(changing the ownership).

Port utilization • Root cause


The usage rate of the port (amount of data
forwarded by the port divided by the amount
of data that can be forwarded by the port)
increases due to a number of volumes
accessing the same port.
• Possible solutions
Consider allocating some volumes (or host
groups) to a different port.

End-to-end performance troubleshooting 41


HIAA Data Analytics and Performance Monitoring Overview
Bottleneck area Root cause and possible solutions

Note: When the connected port is changed,


the host might need to be restarted

Cache utilization • Root cause


Out of the total cache memory allocated to
the CLPR, the percentage occupied by the
data waiting to be written to the drive
increases due to the following possible
causes:
• Possible solutions
The usage rate of the Parity Group might be
high, delaying write processing to the drive.
The usage rate of the MP Unit might be high,
delaying write processing to the drive.
The capacity of the installed cache memory
might be insufficient.

The following table lists the commonly observed server related problems and
possible solutions:

Bottleneck area Root cause and possible solutions

CPU utilization • Root cause


The CPU bottlenecks occur when several VMs
run on the same physical machine, and end-
up sharing the same CPU. If the VMs (logical
resources) share the same CPU (physical
resource) and if one of the VMs utilizes the
CPU more than the others in the shared
infrastructure, the total efficiency of the CPU
is degraded and the CPU utilization rate
increases. The CPU could become saturated
with requests due to resource contention
issues.
• Possible solutions
Consider moving the VMs to another server.

Memory utilization • Root cause


The memory bottlenecks occur when several
VMs (logical resources) share the available
memory (physical resources) which might
result in the performance degradation of the
physical memory.
• Possible solutions
Consider allocating additional physical
memory, or moving the VMs to another
server.

42 End-to-end performance troubleshooting


HIAA Data Analytics and Performance Monitoring Overview
4
Flexible reporting and analysis using
Data Center Analytics
In the fast-paced world of online transactions, many companies with global
operations have invested in a sophisticated IT infrastructure that provides
them a competitive edge. Monitoring and reporting features enable
organizations to monitor applications closely and continuously to proactively
identify any problems before they manifest into something more severe and
requires immediate attention. Whether you are an IT manager for a bank,
health care provider, or a government sector, proactive monitoring and
reporting are useful in determining the performance trend of your system
and addressing ways to improve customer service interactions in advance of
customer feedback. To do this thoroughly requires a tool that can help track
the health of you system at all hours and display the relevant metrics
instantly in a report that you can share with your organization for
assessment.

Hitachi Infrastructure Analytics Advisor integrates with Data Center Analytics


to provide advanced reporting capability to continuously measure and
analyze performance of your monitored resources. The up-to-date visual
representation of your system's health enables you to share reports with
others. You can create three types of reports:
• Predefined reports: provide high-level details at the application level and
also a granular report that shows component-level performance data.
• Ad-hoc reports: enable you to combine related and unrelated metrics of
any monitored resource in one report to review the overall performance
impact.
• Custom reports: you create with a report builder.

All reports are included in the Reports dock, and are available when you
select any storage system object in the storage systems hierarchy. Predefined
reports differ based on your selection of the storage system object. An
interactive chart and filtering resources enable you to view every detail in any
report. You can also filter reports to display the most relevant data, and can
print, create a PDF, and export a report to a CSV file.

Flexible reporting and analysis using Data Center Analytics 43


HIAA Data Analytics and Performance Monitoring Overview
Overall and granular level reporting using pre-defined reports
Each node in the tree has predefined reports that cover important attributes
of a metric to help your analysis of the resource. If you expand and click a
node, for example, 609315f7 under Pools in the tree, the performance report
displays. In this case, the Pool IOPS Vs. Response Time report displays and it
only shows the metrics data for the 609315f7. No data for other Pools appear
on the report.

Compare node and metric with ad-hoc reports


On the reports, nodes are resources such as RAID Storage 302c7d0 and RAID
Storage 302c6d6, and metrics such as cache usage and write pending rate.
You can do a comparison between any nodes or between metrics of a single
node or different nodes. In Add Report, type the report name in the field,
then add specific metrics by dragging and dropping a node from the tree to
either the axis section Y/Left or Y1/Right. The left and right axis boxes
display the list of available resources, for example, virtual machines and
hosts.

44 Flexible reporting and analysis using Data Center Analytics


HIAA Data Analytics and Performance Monitoring Overview
If, for example, you want to see a pattern for a storage node between two
time periods, you can compare the reports on Storage IOPS to display in one
view. Each graph line is color-coded and you can zoom in reports to get a
better view.

You can also compare how one metric affects the other metrics. For example,
you can create an ad-hoc report that compares IOPS with Response Time.
This most commonly used report shows whether an increasing load on the
system (IOPS) affects the performance (response time).

Flexible reporting and analysis using Data Center Analytics 45


HIAA Data Analytics and Performance Monitoring Overview
To create ad-hoc reports, you can combine the related and unrelated
resource metrics and drag and drop the metrics into the report from the
specific instances in the tree. For example, you can see the metrics for ports
and volumes in one chart at any time. Attributes that are directly related, for
example, IOPS and Response Time, usually have a built-in report from the
Reports dock. Sometimes, the attributes can be unrelated (or indirect) such
as the storage system cache usage from the file system transfer rate on a
host can consume most of the storage from the array. You can add unrelated
metrics and create a comparison chart.

Custom reports
If the predefined charts and ad-hoc are not sufficient, you can create custom
reports by building your own query. The Custom Reports feature is based on
the Data Center Analytics query language. This regex-based expressive query
language retrieves and filters the data in the Data Center Analytics database.

The Data Center Analytics query language allows complex analysis on the
data in real time with constant run-time. The syntax makes it possible to
traverse relations, identify the patterns in the data, and establish a
comparison between metrics of a single component or multiple nodes.

The Data Center Analytics UI helps you build your custom query in the
following three ways:

• Start with a predefined query and customize it as required.

• Build the query using the Build Query feature.

• Write the query directly using Data Center Analytics query language.

46 Flexible reporting and analysis using Data Center Analytics


HIAA Data Analytics and Performance Monitoring Overview
5
Monitoring and quick troubleshooting
with Data Center Analytics
Many companies with global operations have invested in sophisticated
storage infrastructure that provides them a competitive edge. Even the
smallest down time in any of these critical applications has a cascading effect
and results in logistical challenges. Therefore, as a Storage Administrator of
your company, you must monitor these applications closely and continuously
to proactively identify and stop potential problems.

Hitachi Infrastructure Analytics Advisor analyzes configuration and


performance data from storage systems, hypervisors, and operating systems.
It defines resource SLO thresholds based on the service agreement, and
monitors the service level through customers' threshold alerts. To maximize
storage performance and ensure performance is at peak efficiency,
Infrastructure Analytics Advisor taps into a scalable data repository and
advanced diagnostic engine to rapidly diagnose and troubleshoot storage
performance bottlenecks. The most common problem is slow response time
of applications.

The problem could be in any storage component such as the front-end ports,
controllers, or disk drives. Infrastructure Analytics Advisor automatically
sends a notification to you when a monitored metric of a storage component
exceeds the defined threshold. The notification contains details of the
component that exceeded the threshold to enable you to quickly identify the
problem and troubleshoot it.

In the example, you navigate from the tree view of Data Center Analytics,
which shows a hierarchical representation of the various storage system
objects, to the highlighted storage system, and then selects an object to
analyze. In this example, controller 0 exceeds the defined threshold of a
monitored metric.

Monitoring and quick troubleshooting with Data Center Analytics 47


HIAA Data Analytics and Performance Monitoring Overview
You quickly view the built-in component reports for historical configuration
and performance metrics, and notice some unusual or unexpected behavior
in the report for Transfer Rate. You see an unusual peak in activity close to
midnight.

To take a closer look, you zoom in on the report.

48 Monitoring and quick troubleshooting with Data Center Analytics


HIAA Data Analytics and Performance Monitoring Overview
You confirm there was a spike in activity for Write Transfers close to midnight
and you must determine if this is a regular pattern or just a one-off situation.
By selecting another period (day) to compare with the current values, you
confirm that a similar peak occurred the day before.

You choose to review similar Configuration and Performance reports for other
components, DP Pools, RAID Groups and other storage array components to
analyze the affect on performance at the application level.

By focusing on the overall application instead of individual volumes or similar


components in detail, you can view the performance metrics at an application
level. This summary of individual resource metrics gives you a consolidated
view of the overall performance. This helps you to identify and solve the
problem faster than viewing individual volumes or similar component metrics.

Monitoring and quick troubleshooting with Data Center Analytics 49


HIAA Data Analytics and Performance Monitoring Overview
The ability to compare related metrics enables you to quickly compare the
data transfer rate and its throughput performance generated by the
application, ports, and storage system to view the affect the application has
on the storage system and ports. If the application is utilizing a lot of
bandwidth, you can decide to provide additional bandwidth for the application
or promote applications to a higher storage tier.

50 Monitoring and quick troubleshooting with Data Center Analytics


HIAA Data Analytics and Performance Monitoring Overview
6
Strategic planning using trend analysis
in Data Center Analytics
Strategic planning with trend analysis provides a repository and analytic
reporting engine that enables you to identify and analyze historical
performance trends necessary to optimize storage system performance and
plan future capacity growth. As an IT Manager of your company, one of your
primary responsibilities is to plan and set aside budget for CAPEX costs
required for future growth of IT infrastructure, specifically hardware and
management software, hypervisors, switches, and other network equipment.
You require an easy way to predict and scale up to satisfy future needs and
growth of the organization.

The Data Center Analytics management server collects and reports


performance and configuration data over time. Using historical data, you can
evaluate the current data usage and predict future requirements. The
following report displays the storage capacity usage trend for a selected
storage system over a specific time period.

The example report shows an increase in the subscription for 53086_Capacity


from April 9, indicated by a blue line. This increase suggests that the pool
requires additional capacity to meet the subscription commitment. As in the
example report, if the consumption increases suddenly, the pool is at a

Strategic planning using trend analysis in Data Center Analytics 51


HIAA Data Analytics and Performance Monitoring Overview
greater risk of running out of disk space. Therefore, you must add more
storage capacity.

Because of the short time window in which the report is created, the change
in capacity is minimal, but for a longer period of time, it will be more visible.
These reports are useful for you to do additional capacity planning closer to
the time of actual requirement.

Trend analysis is an analytical tool to validate the effectiveness of your


storage provisioning strategy over a time period. If the measure of capacity
required to fulfill the subscription commitment compared with total available
free capacity of a storage pool is consistently high, this indicates that your
actual capacity is inadequate to meet your subscription commitment. If the
measure of capacity is low, this suggests that the pool will not completely
utilize the provisioned capacity beyond the current levels and you can safely
move the unused capacity to another pool.

52 Strategic planning using trend analysis in Data Center Analytics


HIAA Data Analytics and Performance Monitoring Overview
HIAA Data Analytics and Performance Monitoring Overview
Hitachi Data Systems
Corporate Headquarters
2845 Lafayette Street
Santa Clara, California
95050-2639
U.S.A.
www.hds.com

Regional Contact
Information
Americas
+1 408 970 1000
info@hds.com

Europe, Middle East,


and Africa
+44 (0) 1753 618000
info.emea@hds.com

Asia Pacific
+852 3189 7900
hds.marketing.apac@hds.
com

MK-96HIAA004-00
May 2016

You might also like