The 5 Pillars of Tape Management GazillaByte LLC August 2012

ABSTRACT: Computer backup tapes have played a critical role in the information processing landscape since the early 1970s. Over this time tape technology has continued to develop in parallel with disk technology and has maintained its inherent advantage as the media best suited for long term backup and archiving. The management of tape is a highly specialized sub-discipline of Information Management and if not done adequately can have significant negative consequences. This document will outline the 5 critical considerations of tape management.

Tape plays a critical part in the day to day operations of most medium to large enterprises around the world. While High Availability (HA) technologies such as redundant disk (RAID) and real-time replication have significantly reduced the chances of a catastrophic data loss, they are not well suited to protecting against data corruption or deletion. As we approach the era of Big Data tape remains perfectly placed as the optimal 1st tier backup mechanism and 2nd tier archival mechanism. Although tape plays a critical role within the enterprise it is often overlooked when it comes to the implementation of management systems. These management systems are essential in ensuring the inherent challenges associated with managing offline data are continually met. The challenges of managing tapes can be viewed as 5 discrete yet related functions. These functions form the 5 Pillars of Tape Management. While the challenges of tape management may change in complexity and magnitude as the number of tapes increases these 5 Pillars of Tape Management remain constant. Conceptually, each of the 5 Pillars answers a fundamental question about the overall management of a tape library.

Pillar Asset Management Chain of Custody Library Management Disaster Recovery Quality Control

Question What do we have? Who has been in touch with it? Where does it go? Are we in a position to recover? How are we doing?

Asset Management

Computer backup tapes were once considered a consumable, which could be ordered, interchangeably used and thrown away with little to no special consideration. Today, however, as tapes have increased in storage capacity and reliability they can no longer be considered as a consumable, today, each tape volume is very much an individual asset of the enterprise. This can be demonstrated by the fact that, while many enterprises lease almost all other critical technologies such as mainframes, software, servers and laptops, they continue to own their tapes. With a modern tape holding terabyte of information, irrespective of the number of tapes an enterprise owns, it is essential that every single tape be uniquely identified and that these tapes be considered a fixed asset of the organization.

When implementing a tape management framework, it is essential that a comprehensive and robust set standards be developed and that these standards are related to a corresponding set of Key Performance Indications (KPIs).

Asset Management Standards

Every tape must be added to the asset management system at the point that it is ordered. Every tape must have a unique volume serial number and that where possible that volume serial number is tied to the manufacturers serial number or CM Chip. That where multiple tape ownership exists within the asset management system that the ownership of the tape is recorded. Every tape should have a single fixed visual label attached. This label should remain constant for the life of the tape and should be attached to the tape itself and never the tape case. That all stake holders in the tape management chain should know each tape as the same single volume serial number. Where possible the asset management system should be connected to all tape management systems within the enterprise.

Asset Management KPIs

Number of tape volumes not in the asset management system. Number of tapes with duplicate primary identifiers. Number of tapes with more than one identifier. Number of active tapes not electronically synchronized between the asset management system and distributed tape management systems for more than one day. Number of active tapes existing in distributed tape management systems but not in the asset management system, or the other way around.

Chain of Custody

Knowing who has handled a tape and when it was handled has always been considered good practice, but in recent years corporate governance, privacy, information handling and critical infrastructure laws have been introduced1 in many jurisdictions. It is expected than further laws will be passed in the future. In addition to this, over the past decade Common Law precedents have been established that specifically acknowledge the importance of retaining a constant chain of custody within tape management2. In developing a standard for chain of custody it is critical that the standard incorporates all stakeholders in the tape management lifecycle and that no change in custody goes unrecorded.

HIPPA (USA), Sarbanes-Oxley (USA), Data Protection Act (UK)

All enterprises should also consider locking the chain of custody database so that it cannot be manipulated from outside the asset management software. It is highly recommended that the following enterprises require that the chain of custody record cannot be retrospectively modified: Enterprises which operate under Freedom of Information laws such as policing. Enterprises with a high security requirement such as defense and intelligence. Enterprises with a high chance of litigation such as healthcare and pharmaceutical. Enterprises with a high degree of regulation such as transport and financial services. Enterprises under strict Corporate Governance laws such as publically listed corporations.

Chain of Custody Standards

That every change in physical location of a tape be recorded. That any data which influences the location of a tape, such an expiry or move date be recorded and that any change to this data also is recorded. That changes to a chain of custody record can only occur through the asset management software and that where third party modifications are required that this can only occur though an Application Programming Interface (API) call. That for a change in the chain of custody to occur a user must be authenticated and securely logged on. That the User-ID, date, time, location and interface of the updating user be recorded for each update that changes the location or may influence the location of a tape in the future. That chain of custody events exist for acquisition, usage, movement, decommissioning and destruction of each tape volume. That regular automated audits are run comparing known information and looking for discrepancies within the chain of custody.

Chain of Custody KPIs

Number of chain of custody events being captured per stakeholder. Number of atypical chain of custody events recognized and recorded. Number of individual audit failures.

Number of chain of custody changes that occur outside the asset management system (when allowed).

Library Management

As tape libraries grow, it becomes increasingly costly to store all tape volumes in robotically controlled devices. It may also be undesirable to store tapes which are critical or subject to litigation within an environment where they could potentially be overwritten. Having a Library Management standard in place will ensure that tapes can be easily and reliably located should they be required. Conceptually, two Library Management methodologies exist today: 1. Arbitrary Library Management: this is where the tape librarian puts a tape in a specific location and records the location within the tape management database. 2. Allocation Library Management: this is where the system assigns a location for a tape to be stored and the tape librarian confirms with the system that the tape has been placed in that location.

Each of these two methodologies have advantages and disadvantages, with the former requiring a less complex tape management system and a higher dependency on process and the latter requiring a more capable tape management system that provides instructions and confirmation of compliance.

Library Management Standards

That the system records the current location of a tape, and in the event that it is moving, also the target location. That the system records the location of a tape down to an individual single tape slot. That a barcode or RFID reader confirms the storage of each tape. That the slotting design avoids large numbers that can be forgotten or confused during tape handling.

Library Management KPIs

Number of tapes found to be in the wrong location. Number of tapes which can be accurately picked and pulled per minute. Time spent in double handling tapes. Regularity of physical audits. Time spent in correcting errors found during physical audits. Number of tapes considered permanently lost Number of tapes that cannot be located but are not considered permanently lost.

Disaster Recovery

As computer hardware and associated infrastructure have evolved High Availability subsystems have considerably reduced the probability of a disastrous situation occurring. It however remains prudent that all enterprises retain a high degree of Disaster Recovery readiness. A comprehensive Disaster Recovery Plan should at the very minimum address the risk of data corruption of loss caused by: 1. 2. 3. 4. Accidental or malicious actions of staff. Programmatic data corruption. Failure of storage and replication sub-systems. Virus, Denial of Service (DoS) and other system shutdown caused by security compromise. 5. Hardware and storage asset confiscation by law enforcement agencies under court order.

Disaster Recovery Standards

Identify all significant perceivable risks to the continuation of information systems. Develop a disaster recovery plan that takes into account the worst case scenario while avoiding optimistic projections of favorable outcomes. Develop a disaster recovery plan which includes an option to restore from scratch (a Bare Metal Restore option). Pre-identify all tapes that would be required to restore each and all systems, including key catalog backups. Set a realistic recovery time objective and recovery point objective. Test the plan regularly using staff who are unfamiliar with the process (excluding staff who are familiar with, or who have been involved with the creation of the Disaster Recovery Plan) to ensure that they can follow the documented recovery plan. Perform both planned and snap disaster recovery tests.

Disaster Recovery KPIs

Recovery Point Time. Disaster Recovery Test regularity. Diversity of staff able to follow the Disaster Recovery Plan. Number of times the most recent recovery point is not available for restore. Number of recovery point that would be available should they be required.

Quality Control

The most critical component of any management framework is its ability to constantly measure its effectiveness with a goal of constantly improving each process and ensuring that there is no degradation in the overall quality of solution. When managing a tape library there are many indicators that can be used calculate the overall library health.

Quality Control Standards

The Quality Control mechanisms should provide an overall library score based on real-time information. Quality Control scores should be recorded over time to produce a historical record of process performance and to visually demonstrate process improvement.

The Quality Control mechanism should provide a diagnosis along with clear instructions on how to cure all identified problems. Quality Control KPIs Days since last library synchronization. Existence of offsite tapes. Days since last offsite. Existence of pre-identified lists of critical tapes which need to be offsite Days since the last list of pre-identified tapes was produced. Number of predefined lists which have one or more tapes not offsite. Number of reconciliation errors. Number of tapes with low quality scores.

