Archiving for Indies

Creating Long-Term Archives for Independent Digital Cinema
Torrey Loomis Silverado Systems, Inc. June 2010

Silverado

In association with

SILVERADO STUDIOS

Silverado Systems, Inc. Archiving for Indies

2

Intro
I read a quote a while back from a document called “The Digital Dilemma: Strategic Issues in Archiving and Accessing Digital Motion Picture Materials” “...the annual cost of preserving a 4K digital master is $12,514...” What? I thought that statement was an error. How was it possible that archiving a 4K digital cinema project could cost $12,514 just one time--let alone on an annual basis? Slow down--I figured I had better read the paper first and then opine later. I am glad I did. Produced by the Science and Technology Council of the Academy of Motion Picture Arts and Sciences, “The Digital Dilemma” is one of the most thoroughly researched and well written documents on the topic of archiving digital cinema content. It is de rigueur reading for anyone involved in archiving assets in any digital cinema production. Subjects include: • • • • • • • What is the difference between a digital library and a digital archive? How much content should be archived? What are the costs involved in a proper archive operation? Who are the other players in media archiving, and what can we learn from them? Are there differences between archiving film and digital content? Are standards developed to provide interoperability between studios? Is there a storage medium preferred over others?

These topics and more are covered in this astounding reference document. Its not the purpose of our paper to reiterate the complete findings of “The Digital Dilemma” but to update a few of the assumptions used in the document since it was first published in 2007 (research began in 2005) and distill the content into some salient workflows that can be used by independent digital cinema productions today with significantly lower cost than first projected. Before you continue with this paper, I recommend you register and download a free digital PDF copy of “The Digital Dilemma” at AMPAS website here: http://bit.ly/cTDsJ1 Finally, the name of this paper was inspired by Mike Curtisʼ great independent cinema blog, HD for Indies: http://www.hdforindies.com This article is targeted towards independent digital content creators and producers, but there is no reason why this knowledge cannot also be scaled to larger facilities and productions.

Silverado Systems, Inc. Archiving for Indies

3

First...a teaser

Photo above depicts the complete contents of the feature film “Rogue River.” http://www.roguerivermovie.com “Rogue River” is a feature produced by KeJo Productions of Roseville, CA: http://www.kejopro.com Shot on the RED ONE digital camera platform, the total contents of principal photography fit neatly onto seven 800 GB-capacity LTO-4 tape cartridges. These tapes act as “digital negatives” which are set aside for safekeeping while a duplicate of the filmʼs content is actively being used in editorial. When editorial is finished, additional LTO-4 tapes will be produced that contain other content generated during post-production such as visual effects shots, audio and music files, Quicktime trailers, and other production assets such as PDF files of scripts, notes, and other documents.

Silverado Systems, Inc. Archiving for Indies

4

Background and Assumptions
Lets go back to the quote that inspired me into researching and writing this document: “...the annual cost of preserving a 4K digital master is $12,514...” Why did this surprise me so much? First--Iʼm not a film guy. My day job includes designing workflows for all-digital projects, so I am used to SD, HD, and 4K oriented productions. When I saw the words “4K digital master” I initially assumed it was leveled at the most widely used 4K acquisition format today: the RED ONE. However, it was not. When “The Digital Dilemma” was written, the term 4K was generally oriented towards a frame size (4096 x 2160) achieved with scanning 35mm film versus an acquisition format. Remember that “The Digital Dilemma” was first released in 2007 which was the same year that RED Digital Cinema released their first batch of RED ONE cameras to owners. Until this point, 4K acquisition was not possible for the average independent shooter--especially at the ultra-low data rates that RED ONE allowed (about 28 MB/s). The assumption that the authors of “The Digital Dilemma” used were 4K frame sizes of approximately 40 MB each. One second of uncompressed film scanned at 4K could easily soak up 1 GB of storage. The authors projected the costs of archiving an entire 4K feature this way: “Based on an annual cost of $500 per terabyte of fully managed storage of 3 copies of an 8.3 terabyte 4K digital master.” Using a calculation like this was perfectly reasonable in 2007, but would radically change over the next few years as RED and other manufacturers began to ramp up competition in 4K acquisition. For the purposes of “Archiving for Indies” the following assumptions will be followed for independent cinema producers: • • • • • • • • • All assets are “born digital” Any asset not digital is scanned or converted to a digital asset Keep everything--discard nothing Videotape and film are not part of production--using fully digital tapeless workflows Assume that the 4K acquired will have some level of compression (i.e. REDCode) Insurance requirements mandate that productions make multiple “digital negatives” of their tapeless files Everything is standardized on LTO (Linear Tape Open) cartridges Long-term archive is NOT to hard drive You WILL have to migrate your data to new formats in the future and that cost should be factored into your overall TCO (total cost of ownership) Your TCO is also affected not just by your media costs, but other costs such as labor, utilities, system hardware and software, upgrades, personnel training, and storage costs.

Silverado Systems, Inc. Archiving for Indies

5

Defining the Archive
What exactly is an archive? And what is a library? The authors of “The Digital Dilemma” define them as the following: “Working Library” is a broad term for elements that are generally kept on hand for distribution purposes. “Archival” is defined as storage of the master elements from which all downstream distribution materials can be created over a 100-year timeframe. If you have successfully moved your tapeless media files onto your computer or RAID-protected SAN, you have a working library. However, a RAID-protected volume is completely inadequate as a bona fide “archive” in the strictest sense of the definition. At this point, we should highlight some very important things. There are NO technologies that exist today that can fulfill the definition of an “archive” that can be trusted to last and be read 100 years from now. Technology is moving so fast that new formats are being developed that will take the place of standards in use today. Given this, the practice of data migration is not an option but a requirement. You will need to recognize when its time to migrate your data from one format to another. To be fair, this process will become easier over time. Using the example of “Rogue River” listed above, lets assume the production winds up with a total of ten LTO-4 tapes that need migration. The natural migration of this data will be to LTO-5. Since LTO-5 will accommodate 1.5 TB of data, its safe to assume that migrating ten 800 LTO-4 tapes to the new format will require approximately five or six LTO5 tapes. When LTO-6 is released in a few years, the storage density is assumed to increase to 3.0 TB per tape. Hence, “Rogue River” on LTO-6 will only need two or three cartridges. IBM and other partners are working on new tape technology that can extend capacities up to 35 TB per tape, so the capacity for tape technology to have a durable longevity is very likely. One distinction here--we are not talking about videotape. LTO (Linear Tape Open) is a data cartridge that can hold any type of digital file. You can think of LTO tapeʼs capabilities the way you would think about the capacity for your hard drive to store files. If it can be stored on a hard drive, it can be archived to LTO.

Silverado Systems, Inc. Archiving for Indies

6

Why not optical disc? Simple--the largest capacities of optical disc today are dual-layer Blu-ray discs at approximately 50 GB per disc. It would take 16 Blu-ray discs to hold the same amount of data as a single 800 GB LTO-4 cartridge. What about hard drives? Magnetic hard drives have serious shelf stability issues that make them unsuitable for long-term storage. The authors of “The Digital Dilemma” explain further:

“It should be noted that magnetic hard drives are designed to be “powered
on and spinning,” and cannot just be stored on a shelf for long periods of time. The drivesʼ internal lubrication must be occasionally redistributed across the data recording surface through normal operation of the drive, otherwise they can develop “stiction” problems where internal components mechanically lock up.” The pre-eminent Final Cut Pro expert Larry Jordan has done a significant amount of research of the stability of hard drives that sit unused on a shelf: “Magnetic signals recorded on a hard disk are designed to be refreshed periodically. If your hard disks stay on, this happens automatically. However, if you store your projects to a removable hard drive, then store that hard drive on a shelf, unattached to a computer, those magnetic signals will fade over time... essentially, evaporating.” “...the life-span of a magnetic signal on a hard disk is between a year and a year and a half. The issue is complex, as you'll see, but this is a MUCH shorter shelf-life than I was expecting.” “The way to keep the files on your hard disks safe is to connect the hard drive to your computer every six months or so and, ideally, copy all the files from one drive to another.” You can read more from Larryʼs article here: http://bit.ly/cac4kj Essentially, if you want to see the data on your hard drives remain intact--youʼre required to do a bit-level data migration or surface scan every six months. This is hardly a recipe for a stable archive--rotating dozens or hundreds of hard drives every six months. What about tapeless media such as CF (Compact Flash) cards and SSD (solidstate drives)? It should be noted that memory-based storage such as these technologies lack moving parts which increases their overall durability, but their highcost and low storage densities currently make them unsuitable for long-term archive.

Silverado Systems, Inc. Archiving for Indies

7

The RAID question
Why shouldnʼt you just leave your media on a RAID volume? Lots of people assume data on a volume under RAID 5 is a perfectly legitimate way to keep things around a long time. Letʼs backup and explain what RAID is, and why this isnʼt a viable option for long-term archive. RAID stands for redundant array of inexpensive (or independent) disks. RAID systems are sophisticated systems that combine multiple hard drives together with a specialized controller that syncs them together for speed, redundancy, or both. There are different RAID schemas. Some can be managed on computer systems with software only. Others are hardware-based because the compute resources needed to drive certain RAID arrays are quite high. Software RAID RAID 0 is called a stripe and its a software-based array that takes each drive in the array and “stripes” them together as one large volume. They gain more speed as each additional drive is added to the overall volume. However, there is no redundancy. If you lose one drive, your data is totally compromised. RAID 1 is called a mirror. You can take two drives and arrange them in a RAID 1 volume and you essentially are writing data to both drives at the same time. The data on each is identical, so if you lose one drive you can swap it for a brand new one and the volume is rebuilt from the data on the remaining good drive. However, there is no speed boost with this arrangement. Hardware RAID What if you could have the best of both worlds? Combine the security of RAID 1 with the speed of RAID 0? That is the benefit of a hardware-based RAID format. They arenʼt inexpensive, but hardware RAID controllers do some very heavy computational lifting. The most common hardware-based RAID formats are 3, 5, and 6. RAID 3 is an array that takes data and spreads it across multiple drives like a RAID 0. The difference is that a single-drive is set aside as parity storage. If you lose a drive, then your data can be rebuilt from all the drives plus material from the parity drive. RAID 3 can stand the loss of any single drive. RAID 5 is similar to RAID 3, however parity data is evenly distributed across all drives rather than allocated to a single drive as in RAID 3. The array is not destroyed by a single drive failure--and because the parity is distributed, the RAID 5 volume is faster at random reads and writes. RAID 5 isnʼt bottlenecked by writing parity data to a single drive. RAID 6 is similar to RAID 5. However, RAID 6 provides fault tolerance from failures of two drives, not just one. As drive capacities get larger and RAID rebuilds take longer, fault tolerance becomes a critical factor.

Silverado Systems, Inc. Archiving for Indies

8

RAID volumes can also be assembled into larger RAID 50 and RAID 60 arrays. Adding a “0” to the end of RAID 5 or RAID 6 simply means you take two distinct hardware RAID volumes and combine them into a RAID 0 using software like Disk Utility. A RAID 50 or 60 has the benefit of RAID 0 speed with the fault tolerance of RAID 5 or 6. If a drive dies inside a RAID 50 or 60, you can pull it out and replace it and the hardware RAID controller inside the affected RAID unit tackles the task of rebuilding the data on that drive. Under a RAID 50, you can lose a drive on each volume and remain intact. A RAID 60 provides fault tolerance against loss of four drives (two on each side.) Now that you have an understanding of what RAID is, here is why its a very poor choice for a permanent archive: 1. RAID systems require power--otherwise, a RAID on the shelf is subject to the same time stresses that afflict standalone hard drives when sitting on a shelf: magnetic deterioration and stiction issues when not spun up for long periods of time. 2. RAID systems are very costly in terms of $/GB. A 32.0 TB RAID from major manufacturers can cost $15,000. When you add in a Mac Pro as the controller, plus fibre card and all accessories needed, your system can easily exceed $25,000. Using 32 TB of online storage for archive would cost $.78 per GB and that doesnʼt include utility costs for keeping the system powered on. A set of LTO-5 backups for 32 TB would cost $.32 per GB--and that includes the entire LTO-5 backup system. If you were just accounting for media costs, the LTO-5 price would be about $.08 to $.12 per GB. And LTO media does not require any electrical power for storage, so your utility charges are less for this type of storage. 3. If you filled up an entire RAID system with archived media, you need to account for the loss of using that storage for other projects. If your RAID is filled up, you canʼt load new media and canʼt work on new projects. 4. One of the biggest concerns with newer-generation RAID systems is the sheer size of the drives. With drive capacities reaching 2.0 TB each (and 3.0 TB coming later this year) its takes an extraordinary amount of time to build RAID volumes. When a drive dies, you need to rebuild the RAID volume after the dead drive is removed and a new drive is inserted. With smaller volumes, this doesnʼt take too long. However, with larger capacity drives this can take an eternity. The dangerous thing about a RAID rebuild is that it is very disk-intensive. Every drive is taxed to move data around to rebuild the volume. If your dead drive was part of a bad batch from the manufacturer, then the other drives in your system likely came from that batch, too. Some users have reported 100+ hours to rebuild their RAID volumes after a drive has died. Under RAID 5, if you lose another drive while rebuilding your RAID volume--all your data is gone.

Silverado Systems, Inc. Archiving for Indies

9

That is one of the primary reasons for RAID 6: it can sustain a dual-drive failure and your data is still intact. RAID systems have an outstanding place in the postproduction chain: immediate access to large pools of online material with extremely high data rates. Because these systems require periodic review and service, they are not suited for lowmaintenance operations like long-term archive. Further, their cost structure makes long-term archive on RAID systems cost prohibitive. Finally, the potential for data loss during a RAID rebuild with larger capacity drives means that RAID is not a “sure thing” in terms of bulletproof archive solutions. That leaves LTO tape as the best option for creating durable long-term archives with a defined forward migration path and reasonable cost structure. In the pages that follow, weʼll define a workflow for long-term LTO archives and nail down hard costs for initial build-out as well as costs spread over amount of tapes generated. For more information on LTO-5 systems, you can refer to the following page at Silveradoʼs website: http://silverado.cc/shop/product.php?productid=1474

Silverado Systems, Inc. Archiving for Indies

10

Appendix A: LTO Workflow Mac Pro-based LTO-5 Archive System using TOLIS Group BRU Producerʼs Edition

The following solution details the basis cost of implementing a complete single tape drive archiving solution using an Apple Mac Pro, TOLIS Group BRU Producerʼs Edition software, an HP Ultrium LTO-5 drive (read compatible with LTO-3 and read/write compatible with LTO-4 and LTO-5) and an ATTO H680 card. Please note that manufacturer MSRP is listed for pricing on items. Monitor is not included here since those are generally readily available. Item
Apple Mac Pro

Quantity
1

MSRP
$2499

Notes
• One 2.66GHz Quad-Core Intel Xeon • 3GB (3x1GB) • 640GB 7200-rpm Serial ATA 3Gb/s • NVIDIA GeForce GT 120 512MB • One 18x SuperDrive • Apple Mouse • Apple Keyboard with Numeric Keypad (English) and User's Guide

ATTO H680 Host Bus Adapter

1

$495

• The ExpressSAS H680 provides highspeed 6Gb/s performance at 600MB/s per port. By utilizing a serial, point-to-point architecture, in addition to PCI Express 2.0 bus technology. • ExpressSAS 6Gb/s HBAs are engineered for demanding IT and digital media applications which require more performance than 3Gb/s SAS/SATA can provide. • The ExpressSAS H680 features eight external ports and allows connections to 256 end-point devices. • HP StorageWorks LTO-5 Ultrium 3000 SAS External Tape Drive • Sustained transfer rate 1TB/hr. compressed rate • Buffer size 256MB buffer • 6 Gb/sec SAS host interface • 5.25 inch half-height • AES 256-bit encryption • The “iTunes” of backup--BRU Producer's Edition™ from TOLIS Group creates easy, drag-and-drop session archives. • Engineers and users have a reliable, flexible, and easy-to-use solution to protect key creative digital assets regardless of computer technical knowledge.  • Post warranty technical support. • Provides continuing technical support beyond the initial 30-day period from data of product purchase. • Provide unlimited access to the support group via unlimited telephone, email, and fax. • Free product updates, as may become available, are also included in the service. • TOLIS support team is staffed by product engineers.

HP Ultrium 3000 LTO-5 Drive

1

$3383

Tolis Group BRU Producerʼs Edition Software

1

$499

Tolis Group BRU Producerʼs Edition Support Plan

1

$199

Total

$7075

Silverado Systems, Inc. Archiving for Indies

11

Appendix B: Media Costs Average LTO-4/LTO-5 Prices per manufacturer

The following chart lists the average MSRP of LTO-4 and LTO-5 media costs per manufacturer.

Manufacturer Sony Maxell Imation HP Fujifilm TDK Quantum

LTO-4 Costs $45.43 $45.15 $48.93 $44.59 $46.20 $44.66 $48.58

LTO-5 Costs $140.32 $220.00 $157.59 $161.70 $227.11 $180.00 $149.00

Average Price Average Price per GB

$46.22 $0.06

$176.53 $0.12

Silverado Systems, Inc. Archiving for Indies

12

Appendix C: Cost Deltas Your backup systemʼs overall cost goes down the more you backup. Here is an overview of how much the system costs based on number of backups made.

Price Per GB in $ USD

0.6

0.45

0.3

0.15

0 16.0 TB 32.0 TB 64.0 TB 96.0 TB 128.0 TB 160.0 TB

Initial costs for LTO-5 system are about $7075 for the system. Tape costs are averaged using $150 per LTO-5 cartridge. Over the life of the system, costs would be $.54 per GB if you backed up 16.0 TB (approximately 11 tapes). On the other end of the spectrum--if you archive 160.0 TB (about 107 tapes) then prices fall to $.14 per GB. Not shown on the chart above: 320.0 TB = $.12/GB# (approximately 214 tapes) 640.0 TB = $.11/GB# (approximately 427 tapes) 1.0 PB = $.10/GB# (approximately 667 tapes) The above costs do not include labor or training costs. They are equipment and media only.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.