You are on page 1of 19
«2 United States Patent Spertus et al. (10) Patent No. 4s) Date of Patent: US 7,529,785 B1 May 5, 2009 (54) EFFICIENT BACKUPS USI DYNAMICALLY SILARED STORAGE POOLS. IN PEER-TO-PEER NETWORKS. (75) Inventors: Michael P Spertus, Chicago,IL (US), Slava Kritoy, Palo Alto, CA (US) Darrell M. Kienzle, Vienna, VA (US), Hans F. van Rietschote, Sunnyvale, CA, (US); Anthony “T- Orling San Luis Obispo, CA (US): William E. Sobel, Stevenson Ranch, CA (US) (73) Assignee: Symantee Corporation, Cupertino, CA. ws) (4) Notice: Subject to any diselaimer, the tem of this patent is extended or adjusted under 35 USC. 154(b) by 269 days. (21) Appl. Nos 111363,780 (22) Filed: Feb, 28, 2006 (1) InwcL GOOF 1730 (2006.01) (52) US.Cl "707/204; 707/203; 707/200 G8) Field of Classification Search 707/10, 7071203- 2085 ‘Se application file for complete search history. 66) References Cited U.S. PATENT DOCUMENTS oo28a76 Bo 702.977 Be 82005 thcher 182006 Leung ta. 709/228 707/208 OTHER PUBLICATIONS “Glacier Highly Durable, Deceatalized Storage Despite Massive Comeated Failures Haeberken, a 2004 IS Staeat Workshop, ‘Cambridge, MA. Now 7, 2008 pent ra a “Maintaining Object Ordering in a Shared P2P Storage Environ ‘ment; Carona, «al; Whippet, Sun Microsystems. Inc: Sep. oud Min Personal Serer M.250%; Reviews PCMag.com: Avg. 17, sent Over PIP Storage Utitis"; Proceaings ofthe T0th IEEE International Workshop on Fuse ‘Trends of Disbuted Computing Systems (FTDCS'08) vol 0: 2004, ‘Weatherspoon, Hakim and John. Kubstwiez Erasure Coding vs Replication A QuaattaiveCompevnon: Paper, 2002: nteret Uni veri of California, Berkley, tp wews sce edu Conference IPTPSO2 (70 ‘Maymounkov, Petar and Davi Mazites "Kadena: A Peeper Infomation Systm Based on the XOR Met Paper, 2002, Interne: New York University: hp es rice eda Conferences. IPTPSO2 170 ps * cited by examiner Primary Examiner—Yicun Wa (4) Attorney, Agent, or Firm—Meyerions Hood Kiv Kowert & Goetze, PC; Jason L. Burgess 6 ABSTRACT A system for efficient backups using dynamically shared ‘storage pools in peetto-pecr networks comprises oncormore processors and memory coupled to the processors. The ‘memory stores instructions executable by the processors 10 implementa backup manager configured vo dynamically sub- divide a storage pool into one or more portions of storage ‘currently designated for local bckup dats and one or more portions of storage curently designated for peer-to-peer (P2P)ackup data. In response to local backup data received ‘roma backup cient, the hackp manager may'store the Tocal backup data in a portion ofthe storage pool that is current designated! for local backup data, The backup manager may thea generate a P2P version ofthe local backup data, ex. by enerypting andr redundaney encoding the local backup ‘data, and transmit pans ofthe P2P version to each of one oF ‘more peer devices inthe P2P network 26 Claims, 7 Drawing Sheets US 7,529,785 BL U.S, Patent US 7,529,785 BL Sheet 2 of 7 May 5, 2009 U.S, Patent (ee dnyseg [e007) HOTZ worsog ¢ A (eieg dayoeg (eed dpe Ulla Sein, ree YOTE Reg A (e1eq dnyoeg. Za) WOT vorwog ———__—___ U.S. Patent May 5, 2009 Sheet 3 of 7 US 7,529,785 B1 ‘Subdivide storage pool into portions currently designated for local backup data and P2P backup data 305 Receive request 315, {a No (P2P backup) —— [>< Local backup? 320 [6 [ "See prion of soage pool steel baa daa eating] storage used for P2P backup data if needed 325 Store local backup data in selected portion 330 [eee it tasmison f F2P eon falta da pervs] | ssp whet bel tek pe comgcio 33. |—| [ars Select portion of storage pool to store P2P backup data, reallocating storage used for local backup data if needed 340 ee ‘Store P2P backup data in selected portion 345 }--—— U.S. Patent May 5, 2009 Sheet 4 of 7 US 7,529,785 B1 Receive request specifying source data set to be backed up in initial phase of backup 405 data object of source data set already accessible? 410 Exclude data object from local backup data 415 Include data object in local backup data 420 FIG. 4 U.S. Patent May 5, 2009 Sheet 5 of 7 US 7,529,785 B1 ee Receive focal backup data 505 ——————— —— | Optionally, rank data objects based on relative urgency of P2P backup S10 ee ’ Start processing of next data object 515 Encrypt local backup data object for P2P backup 520 eee Redundancy-encode data object for P2P backup (e.g., using replication or erasure code) 525 a Transmit P2P version of data to peer devices 530 tL 7 a objects? 535 . — ee No —— End backup 540 FIG. 5 U.S. Patent May 5, 2009 Sheet 6 of 7 US 7,529,785 B1 ‘Search for backup manager 605, ‘Appropriate backup manage? 5 found? 610, Search for peer devices to participate in P2P backup 615 pus local area networks (LANS), home-based LANS, etc Funhermore, most or all of these devices often store dat, at least temporarily that iflost or cored aay lead to eonsid- ‘erable rework andlor to fost business opportunities. While pethaps not as important from business perspective, the Joss ‘or corruption of personal data suchas photographs, financial ‘documents, ete, from home computers and other devices ‘outside eorporate boundaries may also have unpleasant con- sequences. Backing up the data locally, e to devices stored the same building or site as the source daa, i typically ot sulicent, especially in the event of eatstrophic events su! as hurricanes, tornados, floods, fires and the Tike. Furher- ‘more, while leal backups may be relatively fast, in agarewate they often result in multiple copies ofthe same files being backed up: for example, even though many ofthe operating system files in one backup client system may be identical t0 ‘opeeating system files in another backup client system, loca backups initiated from each ofthe clients may typically store independent backup versions of the data from each client scparately, including duplicate hacked-up copies ofthe iden- tical files. In onder fo enable recovery from localized catastrophic ‘evens, various techniques for backup to remote sites have been developed over the years. Many tational disaster recovery techniques are often centrally controlled and expen- sive, however, and are therefore typically imited to provect- ing the most important, mission-critical subsets of business data, In receat years, in ordee to take advantage of the wid ‘ening availability of Internet access and the mass availabilty ‘of cheap storage, peer-to-peer (P2P) backup management techniques have been proposed, In such P2P backup manage- ‘ment environments, for example, each participating device ‘may’be allowed o back up data objects such as Mesinto a P2P network or “cloud” (a large distributed network, such as hundreds or thousands of hosts connected to the Fate). fa the event of a faire at the source device (lhe device from which the data objects were uploaded), the backed up data may be retrieved from the P2P cloud. P2P backup manage- ‘ment software may’be installed at the participating devices to ‘enable discovery of target devices to store backup data, t0 Schedule and perform the P2P backups, to seach for prev ‘ously backed-up data within the P2P cloud, and to retrieve backup data from other deviees ofthe P2P cloud as needed. ‘Oftea, few restrietions are placed an devees for membership jn P2P networks: eg, even a home personal computer tha is ‘only powered on for few hours & day may he allowed to participate in a P2P network 0 o 2 ‘Untortunately, the amount of source data to be backed up can be quite large—for example, if conventional P2P tech- hiques are used, several gigabytes of dats may have to be backed up from single laptop computer in order to beable to suppor full recovery from a disk eash or other failures atthe laptop. Furthermore, the total amount af data uploaded into the P2P nenwork for a backup of a given source dataset is often substantially greater than the ize of the source data itself. This data expansion may be required because few guar antees can usually be provided regarding the availability of any given device in the P2P network. If inane implemen- {ation of P2P backup management, an important file was backed to only one or two target devices ofthe P2P network ‘romasource device. itis quite possible that none ofthe target devices that store the file may be online oravailable when the file has to be recovered. Source data to be hacked up is therefore typically encoded foretror corecton e-,usingan erasure code) and/or replicated atthe source device prior to uploading to several targets in the P2P cloud, so that the probubilily of being able 10 recover the source data is Increased. (In general, an erasure code transforms a data ‘object containing n blocks into data object with m blocks, ‘where is lgge than a such thatthe original data abject can be recovered froma subset of those m blocks) The expansion ofthe source dataset to increase availability ofthe backed-up version further adds tothe upload bandwidth requiremen from the source devices. Since many of the devices whose data is to be backed up into the P2P network often have intermittent connectivity to the P2P network, and may be provided relatively low upload bandwidih when they do have access o the P2P network, it may be diicul for such devices to successfully perform complete hackups into the P2P net- work. Furthermore, some existing P2P backup techniques may require participating devices to reserve substantial amounts of storage (often several times larger than the expected amount of data to be backed up from the device) for ‘incoming P2P backup data, which may also place an unde storage burden on the devices, SUMMARY, ‘Various embodiments of systems and methods for efficient backups sing dynamically shared storage pools in pecrto- peer networks ae disclose. According to one embodiment, a System comprises one or more procestors and. memory feupled to the processors. The memory stores program instructions executable By the processors 10 implement a backup manager configured to dynamically subdivide a stor ‘age pool into one or more portions of storage currently des- jnated for local backup data and one or more portions of storage currently designated for peer-to-peer (P2P) backup data. The backup manager may be incorporated within & variety of different types of devices of a P2P network in various embodiments, such as computer servers sclectd for high levels of availability and connectivity, gateways, routes, firewalls, network attached storage (NAS) appliances, et och bockp manager may be configured 10 coordinate & istributed hackup technique for one or more backup client evices (such as laptops, personal computers, ee.). la response to local backup data received, for example, over LAN from a backup client deve, the Backup manager may store the local backup dat ina fist portion of the storage pool thats curently designated for local backup data, The hackup ‘manager may then generates P2P version o the local backup data, o., by encrypting andor producing eworcorecting tencodings or replicas ofthe local backup data, At least @ portion ofthe P2P version ofthe local backup data may’ thea US 7,529,785 BI 3 be transmitted from the backup manager to e2eh of one oF ore peer devices in the P2P network sch as selected emote backup managers. By generating and transmitting the P2P version into the P2P network on behalf of the client, the backup manager may enable disaster recovery forthe clent’s data whilecliminating some ofthe processing. storage andor networking burden that the client may otherwise have hd 0 bear By intelligently sharing the storage pool among backup data objets for a variety of local and remote clients and ‘eliminating redundant backup objectsas described below, the backup manager may also reduce the overall storage required Tor backups in some embodiments In addition to storing local backup data for one ot more backup clients. a given Backup manager may be configured 10 receive P2P backup dala generated by ather backup managers in the P2P network, andl to sore the incoming P2P data portions ofthe storage pool currently designated to store P2P ‘data, The data blocks comprising the stone pool may be ‘dynamically retargeted to store incoming local and/or P2P backup dats: eg blocks that were storing local backup data may be reallocated to store incoming P2P backup data, and blocks tht were storing P2P backup data may be reallocated to store incoming local hackup data as needed. Blocks may albo be dynamically relsimed and reallocated as needed between portions of the storage pool that feats’ local backup data in some impl blocks of data storing client's local hackup data may be reused for storing client B's Jocal backup data, In some ‘embodiments, the transmission ofthe P2P version of local backup data fora given client may be performed asynehro- ously with respect tothe local backup. the client may be ‘informed that the backup is complete as soon a its local backup data eaches the stomge pool, andthe generation and «dissemination ofthe P2P version may be delayed vat later sandr performed as a low-priority or background activity. la ‘one implementation, one of more blocks storing lacal backup data fora given client's backup may be reallocated to other purposes even before the P2P backup phase fo that elienthas ben complete. none embodiment, backup manager may be configured mite from client for backup. For example, the cient may specify a source data set comprising plurality of data objects such as files to be backed up, The backup manager may be ‘configured to determine whether a restorable version of & particular file is already accessible, eg, from the P2P net- Work, where it may have for example been stored earlier on behalf of some other client. Forexample, restorable versions ‘oF operating system files that are shared By many clients may already be available fom one or more peer backup managers. [such restore version is found, dae backup manager may ‘exclude the particular file fromthe data that i stored i the orage pool of injected into the P2P network, thus furher reducing the time and resources needed 10 complete the backup from the client's perspective "According to another embodiment, asystem may comprise ‘backup manager having access ta P2P network, and one or ‘more backup clients. The backup manager may be configured to dynamically subdivide a storage pool info one or more Portions of storage curently designated for local hackup data Jom the one or more backup clients and one or more portions ‘of storage curently designated for P2P backup data received ‘rom the P2P network. n response to receiving local backp data froma particular backup client oftheoneor morebackup be used (as long as it remains operational} forall the backups ‘originating from a given backup client 130 Is noted that a backup manager 110 may itself fnetion asa backup elient 130 from time time in some embodiments. For example, in FIG. 1, backup manager HOD may serve asa backup man- ager in response to receiving a rues to back up data orig ‘ating at backup manager 1108 (i. data that was generated orstored directly at backup manggee 1108 and did not ative at backup manager 1108 as part of anther backup initiated {rom any other device, 35 Well as in response to requests for backing up data originating at backup client 1108-1 Tn some implementations, backup managers 110 may be selected from among highly available and highly connected pre-existing devices tae P2P network, while in ther imple- ‘mentions highly available dedicated devices (such as NAS ‘appliances or eamputer servers) may’ be added to the P2P network specifically to serve as backup managers. The backup managers and their storage pools may have a igh- {enough availability in some implementations that the level of redundancy required for P2P backups ofthe clint data may be reduced e.g. fewer copies ofa given data block may have to replicated inthe P2P network, since the backup managers and storage pools at which the data block is replicated may ‘havea substantially higher availabilty than theaverage avai ability of devices inthe P2P cloud as a whole. By ensuring that only highly available devices are selected as backup ‘managers, the storage overhead typically associated with P2P backups may thereby be further reduced in such implemen- tons. In some embodiments, plurality of diferent ypes of ‘hardware and software components of backup managers 110 US 7,529,785 BI 1 ‘andthe storage pools may each be configured or selected for high availability: eg, processors, memory, disk devices, etc ‘ofthe backup managers 110 nay each provide higher-than- sverage availability and reliability with respect to simil ‘components at the ather devices of the P2P network. ts noted that i addition to implementing the distributed backup technique, backup managers 110 as well as some of allofthe backup clients 130 may also be configured to imple- ment various more general P2P storage management func- tions in different embodiments. Such functions may include, or example, searching for requested data objects (Such as flles) in the P2P cloud, checking that enough replicas or redundancy-encoded versions of a data abjeet’s blocks remain in the P2P cloud so that the data object ean survive @ specified number of failures of peer devices, adding, addi- tional replicas or redundancy-encoded versions of data ‘objects to the P2P cloud if needed, and deleting previously uploaded P2P versions of data objects from the cloud, based ‘on various P2P storage management algorithms and policies. (The cheek to determine that enough replicas or redundancy 2 ‘encoded versions ofthe data object remain may be termed 3 check for a desired “redundancy level” ofthe data object.) “Tus, the backup managers 110 and/or backup clients 130 in such embodiments may comprise a full-featured P2P storage ‘management software stack. Some or all ofthe specific toch- niques ad algorithms used forP2P storage management nity be configurable in various embodiments, e, based on spe- ‘ile policies and parameters agreed to by the participating ‘devices andr specified by users. IG. 2 isa block diagram illustrating an exemplary storage pool 120 accessible to a backup manager 110 for storing backup data, according toone embodiment. The backup man- ager 110 may be configured to dynamically distribute somge ‘of the poo into respective portions for P2P backup data(e. portions 220A end 2200 ia FIG. 2, collectively referred to herein as portions 220) and for local backup data (e., por tions 2104 and 2108 in FIG. 2, collectively referred herein a portions 210), The tem “Tocal backup data” may be used herein ta describe dat that i hacked up fom a backup client 130 o its designated backup manager 110 in a frst phase of the backup protocol, without using P2P algorithms. In some ‘embodiments, for example, the first phase may comprise the ‘licat backup device 130 (ora user a he client backup device 4130) spociyinga source dataset, and backup software atthe celica andlor ata designated backup manager 110 copying some orall ofthe source data sto a portion 210 ofa storage poo! 120 accessible from the designated backup manager ‘Typical operations performed in P2P algorithms, suet as ‘denification ofa plurality of peer devices to whieh data has to be uploaded, encryption and redundancy encoding etc. ‘may not be performed in the first local phase of the protocol ‘In one embodiment, the designated backup manger 110 may’ be configured to determine, prior to copying the daa 0 a portion 210 of the storage pool 120, whether a backup version of one ormore objects ofthe source dataset is already socessible in the P2P cloud, andor in another portion of the storage pool 120, [fan object is already present in the P2P network oF the storage pool, it may’ be exchided fom the set ‘ofdatacopied tothe storage pool 120on behalf ofthe request- Jing elem, thus reducing both the upload bandwidth required atthe elieat and the storage required fr the focal backup. For ‘example, a user of aelont device 130 e-..alaptop) running, ‘an operating system such as a version of Microsoft Wine ‘dows? may request that th entire"C=" drive be backed up. Inresponse, the designated backup manager 110 may identify alist of files that are to be backed up, and check whether any ‘ofthe files on the lst are already available (for potential 0 o 8 restoration fo the requesting backup client device 130) fro the P2P cloud andor other portions ofthe storage pool 120. ‘The listoffilesto be backed up may be provided by the elient device, or may be obtained directly by the designated backup ‘manager. Ifone or more files (e, operating system fils, application binaries, te, suchas some files typically found in "Windows" of "Program Files" Folders in most Windws"™- based computers) are already available for restoration, the ‘backup manager 110 may only copy the remaining files (ie, the files not already available from the P2P cloud or the storage pool (oa partion 210 of ts storage pool 120. Blocks ‘of data Within storage pool 120 tht Were being used for P2P backup data (or for local backup data for another een) may be reclaimed and retargeted for storing the files eopied from the curent elent n some embosiments data objet may be excluded fro the local backup dataset only if is already restorable from fone or more remote devices of the P2P network: that is, the presence of a copy of a data object of the source dataset in ‘another portion ofthe storage pool atthe designated backup ‘manager 110 may not be sufficient to exelude the object in such embodiments. In one implementation, the backup man- ‘ger 120 may cache or store metadata indicating where backup versions ofthe ile ofthe source dataset that were not copied may be obtained for restoration, en order a avoid Jhavng to search the P2P network when and if restoration of the source data sot is needed. Such metadata may becached at the backup manager 110 jisel, at one or more other deviees of the P2P cloud, andor atthe requesting backup client 130 in various embodiments Its noted that eventhough encryption andar redundancy encoding may aot be required during the first phase of the backup technique, in some implementations cither encryption andor redundancy techniques may be joyed even inthe first phase, eg. if the network path between the backup client device 130 and the designated backup manager 110 snot secure ors not highly availabe. la some embodiments, eg, where some types of client devices have Timited connectivity even to their designated backup manager, the first phaseof the backup may be divided intotwo ‘oF more sessions, so thatthe eent is not forced to remain ‘connected tothe designsted hackup manager for long periods ‘oftime. Incremental backnp techniques may be implemented during the fist phase in some embodiments. c.g. where only ‘modifications made to data objects since a previous backup fare copied to the designated backup manager, ia other embodiments, full backups may be perfoamed during the frst phase. combination of fill ackups followed by aseries of incremental backups may be performed during first phase backups fora given client overtime in some embodiments: ©, fll backups may be performed weekly, and incremental backups may be performed each day in which a full Backup is not performed "The local backup data stored in portions 210 of the storage pool may be prepared for uploading into the P2P cloud, eg, by encrypting andor redundaney encoding the local backup ata, by the designated backup manager forthe beckup client 130 in some embodiments. A P2P version ofthe local hackup data (e4., a version that has been encrypted andor redun-

You might also like