P. 1
2010KS Macdonald-Failure Points in Storage Configurations

2010KS Macdonald-Failure Points in Storage Configurations

|Views: 270|Likes:
Published by Yash Kalra

More info:

Published by: Yash Kalra on Aug 15, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less






  • Introduction
  • Components of the Storage Environment
  • Roles and Responsibilities
  • Table 1 – Typical Roles in a Storage Environment
  • I/O System Components
  • Physical Components of the I/O System: Hosts, Connectivity,
  • Hosts: Physical Components
  • Figure 2 – Host Physical Components in the I/O Path
  • Connectivity: Physical Components
  • Figure 3 – FC Network Physical Components in the I/O Path
  • Storage: Physical Components
  • Figure 4 – Storage Physical Components in the I/O Path
  • Figure 5 – Physical Components in the I/O Path and Roles
  • Logical Components of the I/O System
  • Hosts: Logical Components
  • Figure 6 - Host Logical Components in the I/O Path
  • Connectivity: Logical Components
  • Storage: Logical Components
  • Figure 7 – Storage Logical Components in the I/O Path
  • Figure 8 – Logical Components in the I/O Path
  • Configurable Items in the Storage Environment
  • Figure 9 – I/O Path for Solaris 10 with Leadville and DMX
  • Defining the Problem
  • Solving the Problem
  • Figure 11 – Configuration Management Goal
  • Conclusion

Failure Points in Storage Configurations: Common Problems and a Solution

Charles Macdonald EMC Proven Professional Knowledge Sharing 2010

Charles Macdonald Senior Technology Specialist TELUS charles.macdonald@telus.com

Table of Contents
Introduction ....................................................................................................................... 1 Components of the Storage Environment ......................................................................... 2 Roles and Responsibilities ............................................................................................ 2 Table 1 – Typical Roles in a Storage Environment ................................................... 4 I/O System Components ................................................................................................... 6 Physical Components of the I/O System: Hosts, Connectivity, and Storage ................ 6 Figure 1 – Hosts, Connectivity, and Storage ............................................................. 7 Hosts: Physical Components .................................................................................... 8 Figure 2 – Host Physical Components in the I/O Path .............................................. 9 Connectivity: Physical Components ........................................................................ 10 Figure 3 – FC Network Physical Components in the I/O Path ................................ 12 Storage: Physical Components ............................................................................... 13 Figure 4 – Storage Physical Components in the I/O Path ....................................... 14 Figure 5 – Physical Components in the I/O Path and Roles ................................... 16 Logical Components of the I/O System ....................................................................... 17 Hosts: Logical Components .................................................................................... 17 Figure 6 - Host Logical Components in the I/O Path ............................................... 20 Connectivity: Logical Components .......................................................................... 21 Storage: Logical Components ................................................................................. 21 Figure 7 – Storage Logical Components in the I/O Path ......................................... 24 Figure 8 – Logical Components in the I/O Path ...................................................... 25 Configurable Items in the Storage Environment ......................................................... 26 Figure 9 – I/O Path for Solaris 10 with Leadville and DMX ..................................... 31 Defining the Problem ................................................................................................... 33 Solving the Problem .................................................................................................... 33 Figure 11 – Configuration Management Goal ......................................................... 36 Conclusion ...................................................................................................................... 37

Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies

2010 EMC Proven Professional Knowledge Sharing


In large organizations, responsibility for configuring items between the application and the physical disks that may affect storage availability and performance often cross several functional groups, e.g., storage operations, systems administration, database administration, application support, and design/architecture. Configuration errors or omissions at any level can contribute to performance degradation or decreased availability. However, requirements are often not well understood by all the groups involved. This article offers a generalized overview of the path an I/O operation takes between the application and the storage to illustrate how the interaction of various configurable items provides optimal performance and availability.

This article will provide Storage Administrators with a common language for potential configuration issues with: • • • • • Other Systems Administrators Database Administrators Application Support Design/Architecture Managers responsible for technical groups

An understanding between management and the technical groups ensures that they all work collaboratively to deploy and maintain systems with appropriate configurations that meet performance and availability requirements.

2010 EMC Proven Professional Knowledge Sharing


Components of the Storage Environment
A storage environment includes three categories of components: 1. Physical components of the I/O system 2. Logical components of the I/O system 3. Human actors

Physical components include servers, storage arrays, and connectivity devices. Logical components include databases, server operating systems, and storage system microcode. Human Actors, i.e., the people associated with the storage environment, include business users who create, view, and manipulate data within the storage environment, as well as anyone involved in the architecture, design, deployment, maintenance, and ongoing operation of the storage environment’s physical and logical components.

The purpose of a storage environment is to support business functions and objectives at the least possible cost. Business objectives that drive costs include not just the amount of primary data that must be stored, but also multipliers to the amount of data, such as the number of backups required, backup retention periods, archive retention, and disaster recovery requirements. Associated cost drivers include performance requirements for speed of access to the data, availability requirements that require redundancy, and sufficiency of systems to meet recovery time and recovery point objectives. The three components of a storage environment must be working in concert to fulfill this purpose.

Roles and Responsibilities
The Business User is the most important person within the storage environment. He/she is the reason for the existence of the storage environment. Others in the storage environment perform the tasks required to design, deploy, and operate the storage environment. These tasks are generally consistent regardless of the business setting, but the way that tasks are allocated in different organizations may be quite different. For example, the configuration of disk and file systems on a server may be a Systems Administrator function in one organization, but the responsibility of the Storage Administrator in another. Similarly, higher level functions, such as planning the

2010 EMC Proven Professional Knowledge Sharing


as should methods of engagement and communication between the functional groups. Different structures may co-exist within silos in the same organization for a variety of reasons such as legacy structures inherited with the acquisition of other corporations. they are not intended to define a standard that would be suitable for all organizations. These definitions are for illustration only.particulars of storage allocations. may be determined by a Design group in one organization. and communication are likely to increase the frequency of preventable business function disruptions by introducing failure points in the storage environment. The role descriptions use the term Service Level Agreement (SLA) that implies that some level of service has been well defined. Insufficient collaboration. coordination. or independent management structures in different functional business groups or branches of business. Organizational roles and responsibilities should be well defined and well understood by all of the groups and individuals involved. That may not be true in many organizations. but reside within a Storage Operations group in another. the roles of the typical actors in a storage environment are summarized in Table 1. In this article. 2010 EMC Proven Professional Knowledge Sharing 3 .

configure. e. the desire “we want 100% uptime and never delete anything” usually does not become a requirement when the business is presented with a price tag to achieve it. including performance and availability requirements. Application Support Analysts deploy. and disaster recovery strategies. Architects set general and specific technology directions at a high level. availability. 2010 EMC Proven Professional Knowledge Sharing 4 . *Note: Business Users might not be very good at defining requirements. Designers produce designs within the constraints set by Architects on technology and standards. recovery point objectives. Architects create standards and blueprints for the technology that will be available within an organization. recovery time objectives. and manage applications. data retention periods. Designers create specific technology solutions for defined business requirements.Table 1 – Typical Roles in a Storage Environment Actor Role Business Users store. retrieve.g. Application Support Analysts maintain applications to meet defined SLAs.. including vendor selection. and manipulate data within the storage environment to carry out business functions. and drive the creation of SLAs for items such as performance. having broad impact on strategic technology decisions. Business User requirements* provide the justification for the cost of deploying and maintaining a storage environment.

and dedicated hardware that supports the application. in accordance with designs provided by Designers. configure. and manage storage arrays and dedicated storage networks. Storage Administrators deploy. and manage databases and database related software. including performance and availability requirements. configure. Systems Administrator responsibilities include configuring server components relating to storage.Database Administrators deploy. Systems Administrators deploy. path management software. such as HBAs. Systems Administrators maintain the server environment to meet SLAs. Backup Administrators maintain the backup environment to meet SLAs. such as tape libraries. in accordance with designs provided by Designers. including performance and availability requirements. including performance and availability requirements. and file systems. and manage backup applications. such as Oracle ASM. Backup Administrators deploy. and manage server hardware and operating systems. configure. in accordance with designs provided by Designers. including performance and availability requirements. 2010 EMC Proven Professional Knowledge Sharing 5 . volume managers. Database Administrators maintain the database environment to meet SLAs. configure. Storage Administrators maintain the storage arrays and networks to meet SLAs.

Physical Components of the I/O System: Hosts. to large mid-range systems. but also provide a feature rich environment that mediates the access to disk. such as optical cables. retrieve. and storage devices. and solid state disk drives. The physical components of the storage environment are discussed in more detail in the following sections. Below. and storage. and Storage The physical components of a storage environment are segregated into host. optical disk drives. it also includes hubs. Connectivity.I/O System Components The following sections provide a generic description of the I/O System in terms of physical components. Hosts can be anything from a notebook PC. Hosts are the computers that run the applications that users interact with to store. such as magnetic tape. Storage refers to any storage medium that is external to the host. and manipulate data. connectivity. In the case of networks. such as a VMware server. Figure 1 illustrates the high level relationship between hosts. and network attached storage. logical components. or copper cables. for both block based storage on an FC SAN. 2010 EMC Proven Professional Knowledge Sharing 6 . Connectivity refers to the physical components that provide the medium for communication between hosts and storage. and may also refer to virtual hardware. This article is only concerned with intelligent storage arrays that do not merely provide access to disk. Most of the examples are for hosts attached to block storage over a fibre channel network. and switches. twisted pair cables. routers. but the general discussion applies to other storage networks as well. and their relationships. connectivity.

Figure 1 – Hosts. and Storage 2010 EMC Proven Professional Knowledge Sharing 7 . Connectivity.

RAM and ROM.Hosts: Physical Components To illustrate data flow within a computer. such as the keyboard. The CPU also contains some amount of high speed storage (registers and cache) to facilitate CPU operations. Internal Storage. The CPU. I/O Devices. and devices such as Network Interface Cards (NICs) and Host Bus Adapters (HBAs) that enable host to host or host to external storage communications.g. Note that the host is depicted with two HBAs to provide physical redundancy for the I/O path to the storage. and their I/O connections. monitor. and mouse. Internal Storage consists of memory. and CD/DVD drives. Figure 2 is a high level diagram of the physical components within a host.. the physical components can be generalized to the Central Processing Unit (CPU). for a host with connectivity to block based storage via an FC SAN. and Internal Connectivity. and larger storage devices. e. and I/O devices communicate within a computer over buses and bridges that form the internal connectivity. tape drives. such as disk drives. The CPU consists of the physical components that perform the instructions contained in the programs. I/O Devices include devices that handle user to host communications. 2010 EMC Proven Professional Knowledge Sharing 8 . internal storage.

Figure 2 – Host Physical Components in the I/O Path 2010 EMC Proven Professional Knowledge Sharing 9 .

and not just downtime. The connectivity devices in FC networks can consist of routers. Additional HBAs may be required in the host to provide redundancy for performance if the potential performance degradation in this routine failure scenario is not acceptable to the business user. iSCSI. NFS. switch port failure). and switches. troubleshooting problems with NAS is often complicated by having an additional organizational unit involved (the IP Network Administrators) and a more complex topology. and may require redundancy to meet availability requirements. so are not described in this article.The physical components of the host must be sized appropriately for the expected workload. You must also consider exception scenarios when determining the necessary level of redundancy. provide connectivity between hosts and storage for block based I/O. FC networks. Connectivity: Physical Components There are a variety of connectivity options to support various storage network protocols. IP networks are generally well understood. However. for example. Fibre Channel over Ethernet (FCoE). HBA optic failure. Block based I/O over IP networks use protocols such as iSCSI. it will still have access to disk in the case of a failure that interrupts the I/O path for one HBA (e. and CIFS. such as Infiniband. 2010 EMC Proven Professional Knowledge Sharing 10 . hubs. This article will only consider two types of storage networks: IP networks and FC networks. but will also have a 50% reduction in theoretical bandwidth to the storage. FC switches are the prevalent interconnect device. If business functions are particularly sensitive to performance degradation. so it is the only device considered here. File I/O protocols Network File System (NFS) and Common Internet File System (CIFS) that are used to access Network Attached Storage (NAS) devices are more widely used.. also called fabrics. a generalized description of FC switches is given below. IP networks provide connectivity for host and storage for either block based or file based I/O. or FCoE. a host with two separate HBAs attached to a pair of redundant SAN fabrics.g. Fibre Channel (FC). particularly where the IP connectivity is not a dedicated storage network.

FC switches generally include a number of hot swappable components. Cables used to connect nodes to the FC switch are generally fibre optic cables. Port to port communication in the switch takes place via a bus or backplane. and power supplies. or switch) that is attached to the port. FC networks are typically deployed in pairs of redundant fabrics. Director class switches may also have hot swappable controller cards. and control processor (CP) units. which also allows provides the communication path for the controller units. fans. storage. Figure 3 illustrates the components of a FC network. Ports contain a transceiver (either a gigabit interface converter (GBIC).FC switches consist of three key components: ports. internal connectivity. rather than interrupting it. or a small form-factor pluggable (SFP). but copper may be used in some circumstances. so that a failure in one fabric degrades connectivity. such as GBICs. which transmit and receive signals to and from the node (host. 2010 EMC Proven Professional Knowledge Sharing 11 .

Figure 3 – FC Network Physical Components in the I/O Path 2010 EMC Proven Professional Knowledge Sharing 12 .

then writes it to disk or destages it later. either through direct connections. and physical disks. writes are duplicated on two separate cache boards. 2010 EMC Proven Professional Knowledge Sharing 13 .Storage: Physical Components The physical makeup of a storage array is described by grouping components into four broad categories: front end. The back end consists of storage back adapters and their controllers. Cache mirroring protects uncommitted writes from cache board failures. The internal connectivity in the storage array varies between storage vendors and models of arrays. speeding up read I/O operations by facilitating some portion of the read I/O from the host perspective to be performed at cache rather than disk speeds.e. Read ahead algorithms attempt to prefetch data into cache before hosts request it. One storage array may contain different types of disks. Cache increases the speed of write operations from a host perspective. Different disk sizes may also reside within the same storage array. back end. Cache consists of a number of cards of volatile memory that mediates the transfer of data between hosts and physical disks. battery power dumps cache to vault disks. Serial ATA (SATA). I/O operations to cache are much faster than I/O operations to disk. The front end consists of the storage front adapters and their controllers. Back adapters are configured to provide multiple paths to the disks. Figure 4 illustrates the physical components of a storage array. as the storage array acknowledges the write I/O as committed when it is written to cache. or Solid State Drives (SSD). or a storage network. i. During a power failure. data that is written to disk is not dependent on power to maintain its state. Front adapters provide the interface between the hosts and the storage array. such as FC. cache. The back end controllers also contain some small amount of memory to help facilitate and optimize data transfer between the cache and the physical disks. so the back adapters are not a single point of failure in the storage array. but all contain some type of redundancy. The back adapters connect to physical disks using SCSI or FC. The physical disks in the storage array are persistent data stores.

Figure 4 – Storage Physical Components in the I/O Path 2010 EMC Proven Professional Knowledge Sharing 14 .

and identifies the actors that are responsible for tasks associated with the physical components. The roles and responsibilities for architecture. design. The greater danger lies in the configuration of the storage environment’s logical components. and operation of the physical infrastructure are generally easy to define. and obvious failure points in the infrastructure are relatively easy to avoid if you take care to introduce architectural standards and solution designs that incorporate hardware redundancy.Figure 5 illustrates an end-to-end look at the physical components in the I/O path. 2010 EMC Proven Professional Knowledge Sharing 15 . such as dual HBAs and FC fabrics.

Figure 5 – Physical Components in the I/O Path and Roles 2010 EMC Proven Professional Knowledge Sharing 16 .

Device drivers interact with the programmed instructions called firmware or microcode that reside on the hardware devices. AIX. or may be installed separately. such as HBAs. and the various flavours of UNIX. Device drivers may be embedded with the OS. The OS manages the interactions and interfaces between the users. 2010 EMC Proven Professional Knowledge Sharing 17 . The availability and performance of the storage environment is highly dependent on correctly configuring the logical components to work together. Linux. Hosts: Logical Components Operating Systems (OS) provide the logical container that the rest of the logical components on a host work within. and storage. connectivity. Some hardware devices. Device drivers enable the OS to communicate with specific hardware devices. Logical components may reach beyond the physical divides between hosts. but the physical divides are still a useful model for organizing the discussion of the logical components. Examples of operating systems include Microsoft Windows. the logical components. support user microcode upgrades. such as Solaris.Logical Components of the I/O System The logical components of the I/O system control user interactions with the physical components and interactions between the physical components and include: • • • • • • • • host applications databases host operating systems storage operating systems FC fabric operating systems logical volume managers file systems device drivers The logical components contain a large number of configurable items that multiply to create a vast number of configuration combinations. and the physical components on the host. The OS and drivers provide all of the basic services used for I/O on the host. depending on the OS and the specific hardware. and HPUX.

Oracle and Microsoft SQL are examples of database applications. the OS disk physical events are mapped to actual physical disks by the storage subsystem. such as Veritas Volume Manager. and video games. like Solaris MPxIO. Logical Volume Managers (LVM) provide a virtualization layer above what the OS sees as physical disks. The file system maps user data to logical units of storage called file system blocks. Other LVM functionality may include snapshots. Multipath managers may also provide performance enhancement functionality by incorporating algorithms to spread I/O over multiple paths. Applications reside in a logical layer above the file system.Multipath Managers provide a layer of virtualization above what the OS sees as physical disks. and mediate access to the disk. and the ability to move data around on physical disks without any disruption to data access. or may be provided by a third party. or may span multiple physical disks. LVMs may be integrated with the OS. i.. Examples of application classes include word processors. and then to the OS disk physical extents. e. Applications directly provide services to users. File Systems provide a method to store and organize data in collections of files. Oracle Automatic Storage Management (ASM) provides a file system and LVM-like functionality specifically for Oracle database applications.g.. and retrieval. Multipath managers may be integrated in the OS. web browsers. ASM uses raw disk that does not pass through any other LVM or file system present on the host. 2010 EMC Proven Professional Knowledge Sharing 18 . and then create logical volumes that are again presented to the OS as physical disks. EMC Powerpath™. a portion of a physical disk. They contain the functionality and logic to allow users to perform groups of related tasks. For example. HDS HDLM. From an OS perspective. or Veritas DMP. Devices are presented over multiple physical and logical paths to provide redundant access to storage. or may be a third party product. Logical volumes can be a partition. LVMs group a disk or disks into a volume group. Applications and databases both rely on the file system to manage data storage and retrieval. Database applications are specialized for organizing logically related data to facilitate data analysis.e. software RAID implementations. Path managers recognize that the multiple paths point to a single storage device. and to increase throughput. storage. These are then mapped to LVM extents if an LVM is present. which in the case of external storage are also a virtual entity. such as AIX LVM and Windows Logical Disk Manager.

the logical path of an I/O passes through several layers. 2010 EMC Proven Professional Knowledge Sharing 19 .Since the logical components in the I/O path are layered. • • • • • • • • I/O begins in the application/database layer is passed to the file system then the LVM then to the SCSI target drivers followed by the multipath driver then the SCSI command is encapsulated by the driver or drivers that handle the fibre channel protocol (FCP) then passes on to the HBA driver and out to the SAN This layering is depicted in Figure 6 below.

Host Logical Components in the I/O Path 2010 EMC Proven Professional Knowledge Sharing 20 .Figure 6 .

FC switches run their own operating systems that are also referred to as microcode. Storage: Logical Components The storage array operating system.Connectivity: Logical Components Logical components in the connectivity environment direct traffic between the source and destination devices on the network. Read hits decrease I/O response time by serving reads at cache rather than disk speeds. and read ahead algorithms that recognize sequential read I/O. the principal logical component of interest from a configuration perspective is FC zoning that restricts communications between the nodes that are logged into the FC fabric. Zoning decreases interference between nodes that do not need to communicate with each other. This article does not provide a detailed discussion of the I/O path within the FC fabric. logical device cloning. and simplifies management. then pre-fetch data from disk to cache in anticipation of a host request to increase read hits. as the logical configuration of FC switches is generally not the source of many systemic problems in the storage environment. The storage operating systems also manage performance optimization algorithms. These include command queuing that optimizes I/O by re-ordering commands to reduce seek time and rotational latency on the physical disks. web interfaces. logical device snapshots. The microcode provides all of the FC services and management components such as command line interfaces (CLI). and external storage virtualization. A zone set is a collection of zones that identifies which nodes are visible to each other on the fabric. application interfaces (API). may contain a wide range of features including support for multiple RAID levels. provides security by restricting communications between nodes. often referred to as microcode. and alerting via Simple Mail Transfer Protocol (SMTP). remote replication capabilities. For FC fabrics. 2010 EMC Proven Professional Knowledge Sharing 21 .

The precise path an I/O takes through the storage array varies somewhat depending whether the I/O is a write or a read. Read hits service I/O at cache speeds. or a cache limit has been reached for the logical device. which service I/O at disk speeds. or to scenarios where data is being written directly to disk for reasons other than write aside. As discussed above. allow the storage administrator to set write aside limits that direct large I/Os directly to disk. This slows down the write response time for the host. and then committing the data to disk later. also known as LUN masking. so have a significantly lower response time to the host than read misses. This is called a write hit. Some storage arrays. such as the EMC CLARiiON®. such as cache failure. and describes the majority of write I/O. The storage array must adapt its SCSI emulation to accommodate variations in host operating system requirements. a delayed fast write has occurred. storage arrays increase write performance by acknowledging write I/O to the host once data has been written to mirrored cache. The term write miss may also be applied to delayed fast writes. As mentioned earlier. LUN security. and on its interaction with the cache. read-ahead algorithms increase the number of read hits by prefetching data to cache in anticipation of host requests. commonly referred to as LUNs. If a write I/O is forced to wait for a cache slot to become free because the cache is globally stressed. Active/passive arrays also need to accommodate different behaviours for the various host multipath manager solutions. 2010 EMC Proven Professional Knowledge Sharing 22 . Read hits occur when a read I/O is serviced from data already resident in cache. allows multiple hosts to access LUNs through shared front end ports on the storage array without seeing LUNs that belong to other hosts. A read miss occurs when a read request has to wait for data to be read from the physical disks before it can be returned to the host. but prevents large I/Os from consuming too much cache.Storage arrays partition physical disks into logical devices (ldevs) that are then presented to hosts as SCSI target devices.

and sends it to the host via the front end port. which then commit the data to disk.The sequence of events for a write hit on the storage array begins with FC frames being received on the front end ports. encapsulates it in an FC frame. then service the I/O to/from the physical devices. Note that not all writes are necessarily committed to disk. as illustrated in Figure 7. 2010 EMC Proven Professional Knowledge Sharing 23 . then the original write is never committed to disk. the I/O passes through layers that deal with FCP. then SCSI. The I/O is written to mirrored cache. after which the SCSI payload is extracted by the front end controllers. From a logical perspective. if a host overwrites the data while it still resides in cache (a write cache rehit). Later. the data is destaged from cache to the back end controllers. then map logical devices to physical devices. and then the front end controller creates a SCSI write acknowledgement.

Figure 7 – Storage Logical Components in the I/O Path 2010 EMC Proven Professional Knowledge Sharing 24 .

Figure 8 – Logical Components in the I/O Path 2010 EMC Proven Professional Knowledge Sharing 25 .

and the loss of a disk within a RAID group causes an increase in disk activity when the RAID group rebuilds to a new disk or a hot spare. Common maintenance and deployment activities that should not cause disruption to service include: • • • • • Activating zone sets on the SAN fabric Creating LUNs on storage arrays Allocating LUNs to hosts Microcode upgrades on switches Microcode upgrades on storage arrays 2010 EMC Proven Professional Knowledge Sharing 26 . Storage environments should be able to withstand many types of physical failures and brief connectivity interruptions without causing significant disruption to applications and databases.Configurable Items in the Storage Environment The logical layers in the storage environment have configurable components that affect performance and availability. concealing issues that will only become apparent when an exception scenario occurs with the storage environment. Correct configurations require end-to-end compatibility for the configuration of the logical components. Incorrect configurations at any level can defeat physical and logical designs for high availability. Common failures and events that should not cause disruption to service include: • • • • • • Single disk failure within a RAID group HBA failure in a multipath environment Switch or switch port failure in a multipath environment Storage port failure in a multipath environment Redundant power supply failure on a storage array LUN trespassing in active/passive storage arrays Some of the above may cause performance degradation in busy systems. that end-to-end view can be difficult to accomplish as the various actors often do not understand the end-to-end configuration requirements. However. Most default configurations will work when deployed. the loss of one of two paths as the result of a switch port failure decreases theoretical throughput by 50%. For example.

RAID 10) Disk tier (e. the following Storage Administrator activities are OS agnostic: • • • • LUN masking (unless LUN masking also includes SCSI emulation) LUN creation MetaLUN configuration (e. For example. 10000 RPM) MetaLUN configuration (striped vs.g. RAID 5 vs. SATA. Additional care needs to be taken with active/passive storage arrays such as the EMC CLARiiON if the activity requires a controller failover.g. a CLARiiON microcode upgrades requires the service processors (SP) to reboot... port flag settings on DMX.g. e.g.. FC vs. Volume Set Addressing for HPUX) SCSI emulation and failover configurations (e.g. with only minor variations in configuration to suit the OS of the attached host.. This affects activities like: • • SCSI LUN numbering (e. They affect decisions about items such as: • • • • RAID type (e. From an availability perspective.Some of these activities may cause some performance degradation in busy systems..g. HBA registration on CX) Storage Administrators face greater challenges with configuration for performance considerations. concatenated) Fan out ratios 2010 EMC Proven Professional Knowledge Sharing 27 . storage administrators’ tasks are fairly straightforward. concatenated) Fabric Zoning Storage Administrator activities that are not OS agnostic are due to OS variations in SCSI implementation and fail over requirements. striped vs.. SSD vs. resulting in many multipath events on connected hosts as LUNs trespass back and forth between the two SP. 15000 RPM vs.

there are multiple mechanisms for retrying failed I/O operations. and may include guidelines for parameter tuning beyond the guidelines published in the interoperability matrix. and include required OS. ZFS vs. native multipath vs. and timeout settings for reporting I/O operation failures up to the next logical layer. Database Administrators. and driver patches and upgrades. These documents are updated regularly to cover new products. for both logical and physical components. Detailed connectivity guides for a particular OS and storage vendor combination is usually available from the storage vendor. Each logical layer of the I/O path has some mechanism for dealing with imperfect I/O operations.Most storage availability issues related to configuration reside in the logical components at the host level. The System Administrator runs a data collection tool to capture the configuration of a host (EMCGrab for UNIX. Storage vendors provide guidance on host configurations that have been tested for availability. VxFS. These detailed guides provide an in-depth look at configuration options.g. VxDMP) the responsibility for configurable items affecting availability and performance resides in different functional groups (e. 2010 EMC Proven Professional Knowledge Sharing 28 . As a result. due to a number of factors: • • • the large number of configurable items the large number of product choices available at the various logical levels (e. and Application Support Analysts).. and provides a report on the configuration highlighting any discovered deficiencies. Incompatible configurations between the layers can lead to service disruptions if any layer is not able to appropriately handle I/O operation exceptions. then uploads the compressed output file to the HEAT website. This guidance widely comes in the form of an interoperability matrix that contains information on a wide range of host-connectivity-storage combinations. Several examples of failures due to configuration or compatibility issues are discussed below.. Systems Administrators. EMC PowerPath® vs. EMCReports for Windows). and are suitable general performance requirements.g. HEAT then compares the host configuration against the current interoperability matrix. EMC also provides the web-based Host Environment Analysis Tool (HEAT) to assist Systems Administrators to validate the configuration of hosts that are attached to EMC Symmetrix® or CLARiiON arrays via FC. microcode.

. FC vs. even though NFS on the host recovers gracefully. Host SCSI queue depth settings that control the maximum number of outstanding SCSI commands can be queued against a device. as they now have more control over the configuration of logical devices on the host.1 JMS queues have internal timers that may cause JMS to shut down if I/O hangs during the ‘non-disruptive’ failover of the NAS head during a microcode upgrade. This causes ASM to take disk groups offline. but queue depth can also be implicated in availability. DBAs also choose how to allocate raw devices into the ASM disk groups. Path management software may require SCSI target driver timeout settings that are different than the OS default settings. 2010 EMC Proven Professional Knowledge Sharing 29 . inappropriately high queue depth settings can lead to unusually high I/O response times on a busy storage array. so in some cases problems are introduced at the design stage.g. They are often thought to be only related to performance. DBAs can make many configuration changes that affect performance. Systems Administrator. failing to set the ssd:ssd_io_time on a Solaris 10 host running the Leadville stack can cause the host to panic or offline all disk during regular SAN events such as LUN trespasses. adding new devices into an ASM disk group triggers a rebalancing operation that can have a significant performance impact. Database Administrators have taken on some additional configuration responsibilities.There must be some mechanism at the application level to deal with I/O interruptions that are considered to be normal operations in the storage environment. poor communication between the DBA. For example. which can trigger write timeout values within Oracle ASM. WebLogic 8. and Storage Administrator could lead to devices with dissimilar performance characteristics (e. For example. SATAII) being added to the same ASM disk group. switch microcode upgrades. With Oracle ASM. or storage array microcode upgrades. when applications are introduced that are not able to tolerate normal operations in the storage environment. For example. Whether or not there is a configurable item at the application layer depends on the application. but they can also impact availability. as a result. For example.

Storage vendor qualified HBA drivers and firmware may contain default settings that are different from the OEM installation.FC protocol parameter tuning may also be recommended on some systems. or the length of time the HBA waits before it takes a port offline due to loss of light. Also included are some comments about configurable items at each logical layer. again leading to potential availability issues for common SAN events. mapped to the actor in the organization most likely to perform the configuration. 2010 EMC Proven Professional Knowledge Sharing 30 . affecting items such as the number of I/O retries at the FC frame level. attached via dual FC fabrics to an EMC DMX class array. Figure 9 relates the generic logical layers presented in this article to a specific configuration example: a Solaris 10 host using the Leadville stack.

Figure 9 – I/O Path for Solaris 10 with Leadville and DMX HOST Solaris 10 – Leadville I/O timeout I/O size Block size Configurable Items Applications Databases Application Oracle 10g File System Oracle Automatic Storage Management (ASM)* ZFS ASM Logical Volume Manager (LVM) * if present zpools RAID Striping Cache Quotas Snapshots Clones Block size Compression Diskgroups External Redundancy Allocation Unit Size Striping Stripe size SCSI target drivers ssd /etc/system entries: I/O timeout: ssd_io_time Queue Depth: ssd_max_throttle SCSI retry: ssd ua retry count /kernel/drv/fp.conf entries: Fail port: fp_offline_ticker FC Frame reties: fp_retry_count HBA drivers emlxs HBA HBA 2010 EMC Proven Professional Knowledge Sharing 31 .conf /kernel/drv/mpt.conf Multipath drivers scsi_vhci (mpxio) fcp FC protocol drivers fp /etc/system entries: Automatic LUN discovery: fcp:ssfcp_enble_auto_configuration Fail port: fp_offline_ticker FC Frame reties: fp retry count /kernel/drv/emlxs.

Share Nothing Hot Spares RAID RAID 5 (7+1) Disk Controllers Port Port D i s k 2010 EMC Proven Professional Knowledge Sharing 32 . size) RAID level Share Everything.CONNECTIVITY Port Port FC Zoning FC Zoning Single initiator zoning Single initiator zoning Zoning Port speed Port Port Port STORAGE Port Port Flag settings: Common Serial Number Disable Queue Reset on Unit Attention Fan Out ratios SCSI LUN numbers FC Protocol Fibre and SCSI Port Flag Settings SCSI Emulation LUN mapping LUN Security LUN masking LUN masking MetaDevice striping/concatenation Logical Devices MetaDevices Cache Cache Disk Tier (disk type. speed.

The purpose of configuration management is to identify and track the characteristics of the physical and logical components. Configuration changes resulting from new deployments into the environment. it is safe to state: Most storage related failures are directly attributable to improper configurations either as the result of initial deployment errors. and sustainment activities arising from problem. or disk discovery. Often. most storage related problems occur because people associated with the storage environment deploy flawed configurations that appear to be healthy. As a general statement. with the amount of detail necessary to support decision making regarding the potential impact configuration changes may have within the storage environment. storage goes awry.Defining the Problem On occasion. but may not withstand storage environment exception scenarios. However. and capacity management processes. this information will be recorded in a database referred to as the configuration management database (CMDB). The configuration management discipline should also contain processes that control the 2010 EMC Proven Professional Knowledge Sharing 33 . such as microcode upgrades and redundant hardware failures. In other cases. Each of these processes needs to be tied to a rigorous configuration management discipline to maintain a storage environment that approaches optimal performance and availability within the constraints of the environment’s physical capabilities. incorrect configurations may manifest as problems on a future reboot. As a corollary to the above. it follows that: Most storage related failures are preventable. or failure to maintain the environment by pro-actively applying patches and upgrades. performance. all you have to do is flip through the “Fixed Problems” section of the microcode patches you haven’t applied yet to get some ideas. Solving the Problem Configuration of a storage environment is an iterative process. Configuration management begins with identifying the components in the storage environment. It is easy to imagine nightmarish storage scenarios.

g. etc. Each of the affected organizational groups should have a highly skilled resource involved in the configuration management process since the establishment of deployment standards requires specialized knowledge within each logical layer of the storage environment. and must be effectively integrated into their deployment and maintenance procedures. record. as well as the reason for the particular value it should be set to. deployment. if appropriate. In any case. databases. configuration standards must be effectively communicated to the operational groups that apply the configurations.. or what servers require proactive maintenance to apply an OS patch?) • What configuration changes were introduced in the storage environment that may be linked to the introduction of a current problem? Configuration management in the storage environment is complicated by the number of organizational teams that may be involved in the design. and require configuration audits. Configuration management processes should be deeply integrated with problem management and change management processes. place controls on changes to the configuration). provide the ability to report on the configuration.configuration (i. OS and microcode patches). and will assist in providing answers to questions such as: • • What business services. as it must capture new technologies as they are introduced. servers. and fixes and releases for existing components (e. This higher level information will increase compliance by providing 2010 EMC Proven Professional Knowledge Sharing 34 . may be impacted by a planned configuration change? What business services. This will be an ongoing process. Some organizations may choose to create a separate organizational unit for configuration management. Documentation of the standards should include a description of the functionality of each configuration item.. applications. new driver versions. databases. applications. what servers are connected to a failed switch. servers.. may be impacted by a problem that has been identified? (e. and.e. and maintenance of the configurable items in the I/O path. and standardize the configuration requirements for all of the logical components in the I/O path.g. etc. A successful configuration management implementation must define. a reference to the vendor documentation that specifies the setting. while others may choose to create a committee containing representation from several organizational groups.

or as part of a continuous improvement process. which can be useful as an organizational performance metric. and generally increases their level of awareness and comfort with the storage environment. periodic audits can be limited to some representative subset of the environment to allow the overall compliance of the storage environment to be extrapolated. and also periodically to measure the degree of compliance with configuration standards and uncover any latent threats. 2010 EMC Proven Professional Knowledge Sharing 35 . Audits also allow compliance with configuration standards to be measured and rated. Figure 11 illustrates the desired outcome of implementing a configuration management process for the storage environment.operational staff with a context for configuration decisions. In large environments. Audit configurations to ensure that they comply with the defined configuration standards. Audits should be conducted for each deployment.

Figure 11 – Configuration Management Goal 2010 EMC Proven Professional Knowledge Sharing 36 .

this declining skill set presents challenges in maintaining the health of the storage environment. storage configurations often fail. As a result. Since Systems Administrators are responsible for a large number of configurable items related to storage. due to a number of factors. the configuration management processes must include representation from all of the functional groups involved in configuring the I/O path. physical. As well as tracking the status of all of the configurations within the storage environment. most storage related failures are preventable.Conclusion The purpose of a storage environment is to support business functions and objectives at the least possible cost. To fulfill this purpose. as new Systems Administrators have generally had very limited exposure to external storage. the configuration 2010 EMC Proven Professional Knowledge Sharing 37 . and be integrated into their documentation and procedures. the Storage Administrator and the Systems Administrator roles are likely to reside within the same functional group. or if server configurations and patch levels delay or prevent microcode upgrades being deployed on storage arrays and switches. Thus. such as when configuration errors are embedded in deployment tools. As the storage environment grows larger. Configuration and maintenance issues at the Systems Administrator level can compound very quickly. To be successful. the logical. these roles are often separated into different functional groups. and human components of a storage environment must be working in concert. However. storage related skills within the Systems Administrator groups are likely to decline over time. such as: • • • The large number of configurable items at the host level The large number of product choices available The responsibility for configurable items resides in several different functional groups When an organization first deploys a storage environment. We can reduce storage related failures by implementing a rigorous configuration management discipline. Most storage configuration issues occur in the logical components at the host level. with causes that are directly attributable to improper configurations or poor maintenance practices.

• Decreasing SAN complexity by limiting the number of storage vendors within the same class of storage. but having two vendors for modular storage increases complexity unnecessarily. Their expertise and experience in other customer sites should be leveraged when establishing a configuration management process and in its ongoing maintenance. Storage vendors provide guidance on host configurations that have been tested for availability and are suitable general performance requirements.management process must include regular audit procedures to provide a measurable verification of the level of compliance to standards. In general. 2010 EMC Proven Professional Knowledge Sharing 38 .. storage vendors are strongly motivated to help keep your storage environment healthy. i. modular arrays.e. having different vendors for monolithic arrays. and NAS is manageable. Kickstart for Red Hat Linux. Configuration management faces fewer challenges if the number of unique configurations can be reduced by: • Limiting the number of manual tasks in server deployments by using standard images during deployment and tools such as Jumpstart for Solaris. EMC also provides tools that allow Systems Administrators to quickly check a host configuration against the current EMC support matrix. or NIM for AIX.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->