You are on page 1of 13

A Guide to Managing Company Data

Contigency planning: A guide to managing your companys data

Managing company data may not seem like a critical part of your day-to-day operations, but the day you lose it will certainly alter your perspectiveand perhaps the very trajectory and success of your business. It doesnt matter if your company is goods or service-orientedinformation is the cornerstone of all that you do, no matter what you do. From your daily and yearly sales figures to vendor invoices and other expenses, important tax information and details on your employees, its probably all in one place: your computer or your companys server. Have you ever wondered what would happen if that all disappeared one day? Consider the case of the disgruntled employee at an architectural firm who suspected she was about to be fired. To exact revenge, she settled on sabotage as her weapon of choice and deleted a network of files valued at over $2.5 million. The data was eventually retrieved, albeit utilizing a very costly recovery service.1 Then theres the instance of a pet store chain that housed all of its operational data in a stand-alone database hosted on its website. An outside Web developer, attempting to clean up unnecessary coding on their site, accidentally deleted all business records with one simple keystroke. Without backup, the pet store chains entire inventory, point of sale transaction data and human resource information were lost. The company never recovered and eventually filed for bankruptcy in the same year.2 There are more anecdotal horror stories where those came from, but enough of the bad news. The point is to learn from these unfortunate occurrences, prepare in advance and prevent any kind of data loss from affecting your business. This Blue Paper identifies the importance of safe data storage and makes the case for a strong data backup strategy. We will start with a summary of the evolution of data. Next, we will expound on the different kinds of data that exist as well as a few basic ideas to consider before launching a storage strategy of your own. Then, well discuss the likelihood of failure and the proper questions

1 Angry Employee Deletes All of Companys Data | Fox News.Fox News. FOX News Network, 24 Jan. 2008. Web. 14 Aug. 2012. <,2933,325285,00.html>. 2 Papadimoulis, Alex. Death by Delete.Redmond Developer News. 1105 Media Inc., 1 Jan. 2009. Web. 14 Aug. 2012. <>.
2012 4imprint, Inc. All rights reserved

to ask when planning data recovery. Finally, well close with a handful of device definitions on the most common kinds of data backup software and systems. Lets begin!

The evolution of data

Data is increasing exponentially. In fact, it is estimated that 90 percent of existing digital data has been created within the last two years.3 While Facebook and YouTube are contributing factors, much of this growth can also be attributed to big data. Big data refers to colossal databases used to draw conclusions that are not otherwise obvious. Not only does that mean typical computer activity like how many times deli in Manhattan is searched on Bing or the number of Twitter posts on car insurance, but this also includes daily activities like passing through a toll booth, duration of a cell phone call and tracking purchases on a store credit card. These are awe-inspiring feats when you consider the humble beginnings of modern computing. While the invention of the computer is a little difficult to pinpoint, modern processing is really a child of the 1970s, born out of decks of programming punch cards.4 Technology advanced rapidly thereafter and the cost of data storage dropped dramatically, which accounts for the profound growth since. Heres a chronological overview of the major data milestones throughout the years: 1980s: In January 1980, the cost of storing 1GB of data was $193,000.5 However, with the introduction of the floppy disk (1.4MB capacity) in 1981, followed by the CD (700MB capacity) just one year later, data storage costs fell considerably. 6 1990s: In September 1990, the cost of storing 1GB of data dropped to $9,0007a 95.34 percent decrease within a decade. In the 90s, advances in technology brought about extended capacity CDs and DVDs (able to store up to 4.7GB) and then flash drives, which were capable of housing between 4MB and 256GB.8
3 IBM What Is Big Data? - Bringing Big Data to the Enterprise.What Is Big Data?IBM, n.d. Web. 16 Aug. 2012. <>. 4 Kopplin, John. An Illustrated History of Computers - Part 2.Computer Science Lab. N.p., n.d. Web. 16 Aug. 2012. <>. 5 Komorowski, Matt. A History of Storage N.p., n.d. Web. 16 Aug. 2012. <>. 6 A Brief History of Digital Data. Prod. Viet Huynh.YouTube. Sweat & Pixels Design Studio, 22 Sept. 2011. Web. 16 Aug. 2012. <>. 7 Ibid. 8 Ibid.
2012 4imprint, Inc. All rights reserved

2000s: In February 2000, the cost of storing 1GB of data dropped again to $19.709another 99.78 percent decrease from the previous decade. In addition to USBs storing between 8MB-256GB, optical formats now included BluRay discs with 25GB storage capacity.10 In July 2009, it cost only $0.07 to house 1GB of data.11 Thats when big data truly began to flourish due to the relatively low cost to house expansive databases. Now that weve had a brief history lesson on the evolutionary cost of data, lets talk about your data and what you need to know to store it safely.

Data management plans

There are six factors that apply to almost every industry that should be adequately planned for: data growth and the corresponding cost, server space and data security, peak time and upgrades. Naturally, each organization will have specific data sets that apply only to their company or industry, but lets take a general look at what goes into a data management plan (DMP).

Types of data and the basics of data storage strategy

Understanding the types of files your server holds is the basis for formulating an effective data management plan. This analysis will also help you plan for growth, as well as store your data more efficiently. The first tactic to employ in developing a data storage strategy is data classification. There is software you can uselike the F5 Data Managerto paint a concise picture of the contents on your server. Data analysis and mapping give you a more in-depth look at: What file formats are being created Who is creating them How old they are And how much storage capacity each file consumes Below are four components worth considering before determining the data storage strategy thats right for you. Each is explained and then followed by a series of questions worth asking and understanding before you make any hard and fast decisions. For further clarification, refer to an IT professional.
9 Ibid. 10 Ibid. 11 Ibid.
2012 4imprint, Inc. All rights reserved

1. etadata standards and data provenance M Metadata provides structured information explaining such details as the purpose, origin, geographic location, access conditions, and terms of use of a data collection. To put this into context, files without metadata are like a library without a card catalogue. Here are a few questions worth considering when setting your metadata plan in motion:12 Which metadata standards will you use? Why have you chosen them? How will you record these details? What information is needed to make the data you collect meaningful to others? Likewise, what information do you need to make that data reusable?

2. rovisions for privacy, confidentiality and licensing P You should first explain how and when the data will become available. If there is an embargo period for sharing the data, make sure you provide details explaining the delay. If the data is sensitive in natureif, for example, it contains health-related privacy issues or competitive analysis insightand public access is inappropriate, address the means by which you plan to control access. For instance: Who will hold the intellectual property rights to the data? How long will the original data creator/principal investigator retain the right to use the data before making it available for wider distribution? Are there any embargo periods for political or commercial patent reasons? If so, what are the details? Describe any permission restrictions that will need to be placed on the data. Are there ethical or privacy issues? If so, how will these be resolved?

12 Higgins, Sarah. What Are Metadata Standards.What Are Metadata Standards | Digital Curation Centre. Digital Curation Centre, n.d. Web. 22 Aug. 2012. <>.
2012 4imprint, Inc. All rights reserved

If you have approval from the U.S. Department of Health and Human Services (HHS) Institutional Review Board (IRB), or are in the process of applying for it, how will you comply with those obligations? 3. olicies for data access during and after your project P Think about how you prepare and manage your data for sharing and explain how you will actively share your data with non-group members after the project is complete. You should explain how and where the data will be accessible as well as identify who will be allowed to use it, how they will be allowed to utilize it and whether or not they will be allowed to disseminate it. Think about some of these questions: Will your data be accessible? How will you make it available? Include resources like necessary equipment and systems needed to do that. What is its intended use? Who are its intended users? If permission restrictions exist, what is the process for gaining access to the data? Explain how you will store data during the projects lifetime. How you will archive that data? If applicable, how will you transfer or transmit that data? 4. lans for archiving and preservation P To archive data is to move less important information from an active storage device to a less-used storage device for basic retention purposes. This eases the capacity and enhances the performance of the first, more active device. In terms of data archival, there are many subject-specific data repositories, all of which could serve as an archiving option for your data. But first, ask: How long should data be kept beyond the life of the project? What data will be preserved in the long-term? Which database have you identified as a place to deposit the data?
2012 4imprint, Inc. All rights reserved

What is the long-term strategy for maintaining and curating your data? What procedures does your intended long-term data storage facility have in place for preservation and backup? Are there any conversions necessary to prepare data for preservation or data sharing? What you save and how you save it are directly linked. So be sure to have a solid understanding of the kinds of files and documents and information formats are saved on your computer or server. That way, youll know just what it will take to properly save and store your data.

Backup failure: It happens

Someone once remarked that there are only two types of hard drivesthe ones that have failed and the ones that will fail.13 This adequately describes backup devices: Even though hard drives are not a living organism, they have a definitive life span and each one will eventually die. According to a study conducted by Pepperdine University, here are the most prevalent reasons for failure:14

1. Hardware failure - 40% 2. Human error - 29% 3. Software corruption - 13% 4. Theft - 9% 5. Computer viruses - 6% 6. Hardware destruction - 3% Whether its hardware failure or human error, failure happens. Unfortunately, lost data cannot be saved by implementing a backup system after its gone. Plan appropriately because data backup failure is not uncommon.

13 Backing up Data - Why You Need to Do It.Backing up Data - Why You Need to Do It | PC 911. PC911, 28 Feb. 2011. Web. 17 Sept. 2012. <>. 14 Smith, David M. Graziadio Business Review | Graziadio School of Business and Management | Pepperdine University.The Cost of Lost Data - Graziado Business Review | Graziado School of Business and Management | Pepperdine University. Pepperdine University, 2003. Web. 17 Sept. 2012. <>.

2012 4imprint, Inc. All rights reserved

So what happens to your business when your data backup fails? Well, in the same study by Pepperdine University, a company that experiences a computer outage lasting for more than 10 days will never fully recover financially.15 Worse still is that half of companies that endure such a dilemma will likely be out of business within five years. Hard to believe? Well, computer-stored data, though intangible, is worth a great deal. Value of data lost is determined by its primary utility and frequency of use, both of which are specific to the business that lost it. Take a moment to think about the price of your data. To do that, you might first think of what capabilities you would lose if you lost your data. A lot of them. Maybe even all of them. Could you function without them? Probably not.

Next step: Recovery

Data can be lost in a natural disaster like a flood or fire or it can be physically stolen if someone takes the computer or primary storage device. Data can also be lost in a power failure or power surge. To be smart, implement a few preventative measures in case a backup failure occurs. But first there are four main items to remember in your quest to recover lost data:

1. Restore time objectives (RTO) refers to the amount of time your organization needs to recover from a data loss. Many organizations have multiple RTOs. For example, one RTO may specify how long before the major functions of the enterprise are back online while a second, longer RTO determines how long until everything is fully recovered.16 2. Restore point objectives (RPO) is the maximum length of time you can do without data. Or rather, how quickly do you want or need it restored? Like the RTO, the RPO is often assigned critical functions such as transaction processing. Having a short RPO means having less immediate functions and recovering to a point further back in time. It can be anywhere from a few seconds in the case of a sophisticated (and expensive) remote mirroring system, to several hours, or even several days for less critical data.
15 Ibid. 16 Cook, Rick. Set Disaster-recovery Objectives.Set Disaster-recovery Objectives. SearchStorage, n.d. Web. 22 Aug. 2012. <>.

2012 4imprint, Inc. All rights reserved

3. Network recovery objective (NRO) is the time needed to recover network operations, specifically, how long before you appear recovered to your customers? It includes such jobs as establishing alternate communications links, reconfiguring Internet servers, setting alternate TCP/IP addresses and everything else to make the recovery transparent to customers, remote users and others. 4. Restore granularity objectives (RGO) refers to the level of objects that can be easily recovered (e.g. a file, email, directory, hard drive, full system image, etc.).

However you lose it, the majority of cases83 percentcan be recovered. Youve been warned, though: Recovery can be an expensive operation.17

Device definitions
Most sources available for data storage fail to recognize that in many organizations, not everyone responsible for IT is necessarily an IT professional. This is especially true for small businesses where most employees wear multiple hats. So when it comes to data storage, there are a handful of terms and device definitions to be familiar with in case data is lost and needs to be restored. Here are some basic storage hardware configurations to know: Remote mirroring systems18 One of the most basic tools for the purposes of data storage and backup is known as a remote mirroring system (See also: cloud storage.) As its name implies, it generates a mirror image of the data on one or more disks located locally or remotely. It functions in real time so as to provide the most current critical business data accessible via duplicate disks. Information stored on them can be used for substitution in case of an emergency or be used to facilitate data migration. Disk array A disk array is a kind of storage system that links multiple hard drives into one big drive. Disk arrays organize data into something called logical units (LU).19 To the client, these look like blocks. Small arrays with only a few disks can store eight LU while larger arrays with hundreds of disks can store thousands of LU.20

17 Ibid. 18 Larsen, Brian. Disk Mirroring - Local or Remote.Disk Mirroring - Local or Remote - InfoManagement Direct Article. InfoManagement Direct, 1 Dec. 2003. Web. 18 Sept. 2012. <>. 19 What Is Disk Array?What Is Disk Array? - A Word Definition From the Webopedia Computer Dictionary. Webopedia, n.d. Web. 17 Sept. 2012. <>. 20 Ibid.
2012 4imprint, Inc. All rights reserved

The most common kind of disk array is a Redundant Array of Independent Disks (RAID). The advantage of RAID backup lies in its name: Redundancy implies its ability to write and store data to multiple locations in case a file is damaged or stored in a bad cluster. If thats the case, it is instantaneously rewritten on another disk in the array, which increases overall storage performance.21 This kind of configuration is particularly useful for organizations with servers laden with multimedia-heavy data.22 In case youre unfamiliar with this term, perhaps you know it as a drive array or storage array, which generally mean magnetic or solid state disks. These are two or more disk drives built into a stand-alone unit, typically using some RAID configuration (seeRAID). However, optical drives (CD, DVD, etc.) also come in multi-drive units (seeoptical disc library). SeeSAN,NASandserver farm.23 Direct attached storage (DAS) Direct attached storage involves a direct connection to the server, either through the use of an internal server disk controller or an external storage subsystem.24 DAS systems are recognized for their ease of management, generally low operating costs and overall simplicity. However, one drawback of using DAS is that it creates information isolation, meaning that the information is inaccessible from other servers. Small businesses may see this as only slightly problematic whereas larger businesses, not being able to access data may become a serious problem. Network attached storage (NAS) As it implies, NAS is storage attached to the common network via Ethernet. It is essentially a file server that often integrates an optimized operating system dedicated to file sharing. This means that all processing is done locally at the clients request. Besides its reputation for easy installation, another major benefit to NAS is solving the compatibility issue with Microsofts Windows platform and UNIX, allowing file access without additional software. To give this acronym more context, Western Digitals WD Sentinel DX4000 is a prime example of a NAS device designed for small businesses. As with most devices, installation is as simple as plug and play, which initializes the automatic
21 RAID - Redundant Array of Independent Disks.What Is RAID (Redundant Array of Independent Disks)? A IT DefinitionWebopedia. Webopedia, n.d. Web. 17 Sept. 2012. <>. 22 Kayne, R., and Niki Foster. What Are Disk Arrays?WiseGeek. Conjecture, 11 July 2012. Web. 17 Sept. 2012. <>. 23 Encyclopedia.Disk Array Definition from PC Magazine Encyclopedia. PC Magazine, n.d. Web. 22 Aug. 2012. <,1237,t=hard+disk+array&i=41489,00.asp>. 24 Parwar, Ashwin. Understanding Storage Basics - DAS-NAS-SAN.Understanding Storage Basics - DAS-NASSAN. WizIQ, n.d. Web. 22 Aug. 2012. <>.
2012 4imprint, Inc. All rights reserved

system configuration. On the users end, setting user preferences is the final task. The major drawback for employing a NAS, however, is its performance. It provides file-level input/output (I/O) via traditional file shares, while DAS and SAN provide block-level I/O. If your eyes are already glazing over, youre not alone. When thinking of file vs. block access, lets look at it from another perspective: File sharing is like reading a classic novel. You have an in-depth view of the characters, the landscape and the plot. You can revisit each section and draw deeper conclusions. Conversely, block sharing is similar to the CliffsNotes versionyou still get useable information, albeit not as complete. Block data is suitable for images or other large files that are not altered often while file access is most appropriate for documents requiring change more regularly. Storage area network (SAN) Storage Area Networks are designed to be accessible by multiple servers, just as local area networks (LAN) connect a server to multiple computers.25 Unlike a DAS or NAS, all of which contain a single piece of hardware, SANs are built from multiple hardware components. These componentshubs, switches, bridges, Small Computer System Interface (SCSI)are typically connected by a Fibre Channel. If an Ethernet cable is like a straw pulling information off the network, a Fibre Channel is like an oil pipeline for information. These hardware components play a role in three areas: redundancy, speed and volume. Switches and hubs generally do the same thing. Like the post office, both process incoming informationor mail. Switches take that information and quickly deliver it to a specific locationor mailbox. Hubs, however, arent as discerning. Imagine a small apartment building where the mail is left in the lobby in bulk. Each tenant must sort the mail and determine what is addressed to them, creating a time-consuming redundancy in the analysis. Both have their advantages, but hubs operate best with small enterprises, whereas switches are for more data-intense operations. Referring back to what type of data is being produced by the organization will help determine which components will be most beneficial.26 From availability, reliability, scalability, performance, manageability, and return on information management, SANs have many advantages. 27
25 SAN.SAN (Storage Area Network) Definition., n.d. Web. 22 Aug. 2012. <>. 26 SAN Tutorial. Manhattan Skyline GmbH, n.d. Web. 11 Sept. 2012. <>. 27 Storage Area - All about Storage Area Network., n.d. Web. 22 Aug. 2012. <>.
2012 4imprint, Inc. All rights reserved

As we already stated, NAS operates with file level access, whereas DAS and SAN are block level, but there are several different types of high-speed interfaces used to determine SAN function. In fact, many SANs today use a combination of different interfaces. Currently, Fibre Channel serves as the de facto standard in most SANs. Fibre Channel is an industry-standard interconnect and highperformance serial I/O protocol that is media independent and supports simultaneous transfer of many different protocols. Additionally, SCSI interfaces are frequently used as sub-interfaces between internal components of SAN members, such as between raw storage disks and a RAID (redundant array of independent disks) controller. In an effort to illustrate a few ways to utilize a SAN and the benefits to be had lets take, for example, an insurance agency with two locations, each with two SANs: Location A has SAN 1 programmed to back up its internal operations each hour. On SAN 2, backup runs for Location B. Location B mirrors this set up. If the first SAN in location 2 fails, a simple DNS reroute will restore operations within moments rather than risking several days of downtime while IT tries to remedy the situation. In a simplified example, a big box retail chain stores their inventory on Server A and their transactions on Server B. With the SAN, the sales agent can call upon both servers to analyze the supply on Server A and demand on Server B, all in real time and directly from his personal computer. While all of the aforementioned systems provide backup, various backup software work better with a SANs. Imagine this scenario: In a drive-thru, you order a cheese burger and you pull around to the window, where they provide you with your order. If your order is correct and timely, do you think twice about the process that occurred inside? Probably not. The same theory applies to basic data storage systems.

Data storage and backup are complex issues, but they are also critically important. As you explore storage options for your companys valuable data, keep these helpful guidelines in mind: Reevaluate your backup software annually. Ask yourself if it is still able to meet your needs. Organizations that do not monitor data storage are more likely to let crisis drive them toward an inefficient change.

2012 4imprint, Inc. All rights reserved

Stay on top of your backup infrastructure. Use three simple rules: Match the class of software to the environment; keep your backup software up to date; and continue to enhance the architecture as your performance and capacity needs increase. Look closely at different vendors. When evaluating vendor offerings, look to how they are employing agentless backup, storage level snapshots, and APIs in the virtual infrastructure (such as VMware) for fast, low overhead and virtual infrastructure backup. Leverage capacity-based licensing. To this end, look to cost justification, better data management and storage tiers. Some argue that up to 70 percent of data subject to backup is unchanged and should not be in primary storage, but rather in an archive. Capacitybased licensing exposes the cost of backup by data volume, reducing the volume and thus the cost of backup. Capacity licensing should also incorporate some overhead for expected data growth. Even if backup doesnt seem like a pressing priority right now, youll want to prepare sooner rather than later because backup isnt important until it fails. As they say, Theres no time like the present.

4imprint serves more than 100,000 businesses with innovative promotional items throughout the United States, Canada, United Kingdom and Ireland. Its product offerings include giveaways, business gifts, personalized gifts, embroidered apparel, promotional pens, travel mugs, tote bags, water bottles, Post-it Notes, custom calendars, and many other promotional items. For additional information, log on to
2012 4imprint, Inc. All rights reserved