You are on page 1of 6

1

Cloud Data Storage Security through Public Auditability and Data Dynamics - A Survey
A. Proxten Dsilva*, M.P Revathi** *Final Year ME(CSE) JJCET ,**Asst. Professor (SE.G) JJCET
Abstract Cloud Computing has been envisioned as the nextgeneration architecture of IT Enterprise. It moves the application software and databases to the centralized large data centers, where the management of the data and services may not be fully trustworthy. This unique paradigm brings about many new security challenges, which have not been well understood. This work studies the problem of ensuring the integrity of data storage in Cloud Computing, the support for data dynamics via the most general forms of data operation, such as block modification, insertion, and deletion, is also a significant step toward practicality, since services in Cloud Computing are not limited to archive or backup data only.The works on ensuring remote data integrity often lacks the support of either public audit ability or dynamic data operations. .

Index Terms Data storage, data dynamics,cloudcomputing I. INTRODUCTION

HERE are several trends that are opening up the era of Cloud Computing, which is an Internet-based of data, little of which are accessed. They also hold data development and use of computer technology. The ever cheaper and more powerful processors, together with the software as a service (SaaS) computing architecture, are transforming data centers into pools of computing service on a huge scale. Meanwhile, the increasing network bandwidth and reliable yet flexible network connections make it even possible that clients can now subscribe high-quality services from data and software that reside solely on remote data centers. Although envisioned as a promising service platform for the Internet, this new data storage paradigm in Cloud brings about many challenging design issues which have profound influence on the security and performance of the overall system. One of the biggest concerns with cloud data storage is that of data integrity verification at entrusted servers. For example, the storage service provider, which experiences Byzantine failures occasionally, may decide to hide the data errors from the clients for the benefit of their own. What is more serious for long periods of time during which there may be exposure to data loss from administration errors as the physical implementation of storage evolves, e.g., backup and restore, data migration to new systems, and changing memberships in peer-to-peer systems. Archival network storage presents

unique performance demands. Given that file data are large and are stored at remote sites, accessing an entire file is expensive in I/O costs to the storage server and in transmitting the file across a network. Reading an entire archive, even periodically, greatly limits the scalability of network stores. (The growth in storage capacity has far outstripped the growth in storage access times and bandwidth Furthermore, I/O incurred to establish data possession interferes with on-demand bandwidth to store and retrieve data. The clients need to be able to verify that a server has retained file data without retrieving the data from the server and without having the server access the entire file.. Early work concentrated on data authentication and integrity, i.e., how to efficiently and securely ensure that the server returns correct and complete results in response to its clients queries .Later research focused on outsourcing encrypted data and associated difficult problems mainly having to do with efficient querying over encrypted domain . Provable data possession (PDP)[1] that provides probabilistic proof that athird party stores a file. The model is unique in that it allows the server to access small portions ofthe file in generating the proof; all other techniques must access the entire file, The central goal in PDP Is to allow a client to efficiently, frequently and securely verify that a server - who purportedly stores clients potentially very large amount of data - is not cheating the client. The dynamic provable data possession (DPDP)[12], which extends the PDP model to support provable updates on the stored data. The POR[3] protocol in which the verifier stores only a single cryptographic keyirrespective of the size and number of the files whose retrievability it seeks to verifyas well as a small amount of dynamic state (some tens of bits) for each file, HAIL[4] manages file integrity and availability across a collection of servers or independent storage services. It makes use of PORs as building blocks by which storage resources can be tested and reallocated when failures are detected. HAIL does so in a way that transcends the basic single-server design of PORs and instead exploits both within-server redundancy and cross- server redundancy, Also the dynamic data support is most important to ensure the correctness of users data in the cloud. The erasure correcting code in the file distribution preparation to provide redundancies and guarantee the data dependability. This construction drastically reduced the communication and storage overhead as compared to the traditional replication-based file distribution techniques. [14]By utilizing the homomorphic

2 token with distributed verification of erasure-coded data, the storage correctness insurance as well as data error localization in achieved : whenever data corruption has been detected during the storage correctness verification, the simultaneous localization of data errors has been guaranteed , i.e., the identification of the misbehaving server(s). client can verify that the server possesses the file by generating a random challenge against a randomly selected set of file blocks. Using the queried blocks and their corresponding tags, the server generates a proof of possession. The client is thus convinced of data possession, without actually having to retrieve file blocks. B.SPDP and EPDP
II.

PROVABLE DATA POSSESSION In the Setup phase, the client computes a homomorphic verifiable tag for each block of the file. In order to maintain constant storage, the client generates the random values by concatenating the index to a secret value; Each value includes information about the index of the block, in the form of a hashed random value. This binds the tag on a block to that specific block, and prevents using the tag to obtain a proof for a different block. These tags are stored on the server together with the file F. The extra storage at the server is the price to pay for allowing thin clients that only store a small, constant amount of data, regardless of the file size. In the Challenge phase, the client asks the server for proof of possession of file blocks whose indices are randomly chosen using a pseudo-random permutation keyed with a fresh randomly- chosen key for each challenge. This prevents the server from anticipating which blocks will be queried in each challenge ,Client also generates a fresh (random) challenge to ensure that server does not reuse any values from a previous Challenge phase. The server returns a proof of possession that consists of two values ,Client can remove all the hashed random values from T because it has both the key for the pseudo-random permutation and the secret value, Client can then verify the validity of the servers proof. The E-PDP scheme reduces the computation on both the server and the the client to one exponentiation, E-PDP only guarantees possession of the sum of the blocks and not necessarily possession of each one of the blocks for which the client requests proof of possession. They focused on the problem of verifying if an untrusted server stores a clients data. they introduced a model for provable data possession, in which it is desirable to minimize the file block accesses, the computation on the server, and the client-server communication. Their solutions for PDP fit this model: They incur a low (or even constant) overhead at the server and require a small, constant amount of communication per challenge. Key components of our schemes are the homomorphic verifiable tags. They allow to verify data possession without having access to the actual data file. These schemes also impose a significant I/O and computational burden on the server.

Ateniese et al introduce a model for provable data possession (PDP) [1], [6], [12]that allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Here they formally define protocols for provable data possession (PDP) that provide probabilistic proof that a third party stores a file. introduce the first provablysecure and practical PDP schemes that guarantee data possession. implement one of our PDP schemes and show experimentally that probabilistic possession guarantees make it practical to verify possession of large data sets. They described a framework for provable data possession. This provides background for related work and for the specific description of our schemes. A PDP protocol checks that an outsourced storage site retains a file, which consists of a collection of n blocks. The client C (data owner) preprocesses the file, generating a piece of metadata that is stored locally, transmits the file to the server S, and may delete its local copy. The server stores the file and responds to challenges issued by the client. As part of pre-processing, the client may alter the file to be stored at the server. The client may expand the file or include additional metadata to be stored at the server. Before deleting its local copy of the file, the client may execute a data possession challenge to make sure the server has successfully stored the file. Clients may encrypt a file prior to out-sourcing the storage. For our purposes, encryption is an orthogonal issue; the file may consist of encrypted data and our metadata does not include encryption keys. The client issues a challenge to the server to establish that the server has retained the file. The client requests that the server compute a function of the stored which it sends back to the client. Using its local metadata, the client verifies the response. A. Homomorphic verifiable tags Because of the homomorphic property, tags computed for multiple file blocks can be combined into a single value. The client pre-computes tags for each block of a file and then stores the file and its tags with a server. At a later time, the

III.

SCALABLE AND EFFICIENT PROVABLE DATA POSSESSION

Atniese et al[2] construct a highly efficient and provably secure PDP technique based entirely on symmetric key cryptography, while not requiring any bulk encryption. Also, in contrast with its predecessors, their PDP technique allows

3 outsourcing of dynamic data, i.e., it efficiently supports operations, such as block modification, deletion and append. Their scheme is based entirely on symmetric-key cryptography. The main idea is that, before outsourcing, OWN pre-computes a certain number of short possession verification tokens, each token covering some set of data blocks. The actual data is then handed over to SRV . Subsequently, when OWN wants to obtain a proof of data possession, it challenges SRV with a set of random- looking block indices. In turn, SRV must compute a short integrity check over the specified blocks (corresponding to the indices) and return it to OWN . For the proof to hold, the returned integrity check must match the corresponding value precomputed by OWN . However, in our scheme OWN has the choice of either keeping the pre-computed tokens locally or outsourcing them - in encrypted form - to SRV . Notably, in the latter case, OWN s storage overhead is constant regardless of the size of the outsourced data. Our scheme is also very efficient in terms of computation and bandwidth. A.Setup phase Choose the number of tokens ,for each round generate the permutation key and challenge nonce, Compute the token using the hashed challenge nonce, randomly selected data blocks. Perform the Symmetric key encryption to token index and token. A. Verification phase Compute the permutation key and challenge nonce for a particular token index, client or the owner sends the permutation key and the challenge nonce to the server, Server Computes the token using the hashed challenge nonce, randomly selected data blocks, server sends encrypted token and server generated token, then decrypt the token and then verify, if decryption fails then reject. They described the dynamic process such as update by the process in which the client sends an update request to the server server returns to the client with an encrypted token, for each token from the server a decryption process is carried out if the decryption fails then the token is not prefixed by the index value and the token alone is extracted, compute permutation key and challenge nonce, if the pseudo random permutation is equal to the nth block then token value is the XOR of old token,new block hash and old block hash. Encrypt the counter index and token and send it to the server. In the deletion process the change is made with in the XOR process instead of hashing old block the deleted block is hashed. The method surpassed prior work on several counts, including storage, bandwidth and computation overheads as well as the support for dynamic operations. since it is based upon symmetric key cryptography, it is unsuitable for public (third-party) verification.
IV.

DYNAMIC PROVABLE DATA POSSESSION

Chris et al [8] present a definitional framework and efficient constructions for dynamic provable data possession (DPDP), which extends the PDP model to support provable updates to stored data. They have provided a definitional framework and efficient constructions for dynamic provable data possession (DPDP), which extends the PDP model to support provable updates on the stored data. Given a file F consisting of n blocks, they define an update as either insertion of a new block (anywhere in the file, not only append), or modification of an existing block, or deletion of any block. Therefore our update operation describes the most general form of modifications a client may wish to perform on a file. Their DPDP solution is based on a new variant of authenticated dictionaries, where they use rank information to organize dictionary entries. Thus they are able to support efficient authenticated operations on files at the block level, such as authenticated insert and delete. They have proved the security of our constructions using standard assumptions and also shown how to extend our construction to support data possession guarantees of a hierarchical file system as well as file data itself. To the best of their knowledge, this was the first construction of a provable storage system that enables efficient proofs of a whole file system, enabling verification at different levels for different users (e.g., every user can verify her own home directory) and at the same time not having to download the whole data (as opposed to [10]). Their scheme yields a provable outsourced versioning system (e.g., CVS). V. PROOFS OF RETRIEVABILITY FOR LARGE FILES Ari et al [3] introduced a POR protocol in which the verifier stores only a single cryptographic key irrespective of the size and number of the files whose retrievability it seeks to verifyas well as a small amount of dynamic state (some tens of bits) for each file. More strikingly, and somewhat counter intuitively, their scheme required that the prover access only a small portion of a (large) file F in the course of a POR. In fact, the portion of F touched by the prover is essentially independent of the length of F and would, in a typical parameterization, include just hundreds or thousands of data blocks. Briefly, their POR protocol encrypts F and randomly embeds a set of randomly-valued check blocks called sentinels. The use of encryption here renders the sentinels indistinguishable from other file blocks. The verifier challenges the prover by specifying the positions of a collection of sentinels and asking the prover to return the associated sentinel values. If the prover has modified or deleted a substantial portion of F, then with high probability it will also have suppressed a number of sentinels. It is therefore unlikely to respond correctly to the verifier. To protect against corruption by the prover of a small portion of F, they also employ error-correcting codes. They have let F refer to the full, encoded file stored with the prover.

4 simultaneous localization of data errors, i.e., the identification of the misbehaving server(s). A. File Distribution Preparation It is well known that erasure-correcting code may be used to tolerate multiple failures in distributed storage systems. In cloud data storage, they have relied on this technique to disperse the data file F redundantly across a set of n = m+ k distributed servers. A (m + k, k) Reed-Solomon erasurecorrecting code is used to create k redundancy parity vectors from m data vectors in such a way that the original m data vectors can be reconstructed from any m out of the m + k data and parity vectors. By placing each of the m + k vectors on a different server, the original data file can survive the failure of any k of the m+k servers without any data loss, with a space overhead of k/m. For support of efficient sequential I/O to the original file, our file layout is systematic, i.e., the unmodified m data file vectors together with k parity vectors is distributed across m+ k different servers. B. Challenge Token Precomputation In order to achieve assurance of data storage correctness and data error localization simultaneously, our scheme entirely relies on the pre-computed verification tokens. The main idea is as follows: before file distribution the user pre-computes a certain number of short verification tokens on individual vector G(j) (j {1, . . . , n}), each token covering a random subset of data blocks. Later, when the user wants to make sure the storage correctness for the data in the cloud, he challenges the cloud servers with a set of randomly generated block indices. Upon receiving challenge, each cloud server computes a short signature over the specified blocks and returns them to the user. The values of these signatures should match the corresponding tokens pre-computed by the user. Meanwhile, as all servers operate over the same subset of the indices, the requested response values for integrity check must also be a valid codeword determined by secret matrix P. C. Correctness Verification and Error Localization Cong et al[15] proposed an effective and flexible distributed scheme with explicit dynamic data support to ensure the correctness of users data in the cloud. They rely on erasure correcting code in the file distribution preparation to provide redundancies and guarantee the data dependability. This construction drastically reduces the communication and storage overhead as compared to the traditional replicationbased file distribution techniques. By utilizing the homomorphic token with distributed verification of erasurecoded data, their scheme achieves the storage correctness insurance as well as data error localization: whenever data corruption has been detected during the storage correctness verification, our scheme can almost guarantee the Error localization is a key prerequisite for eliminating errors in storage systems. However, many previous schemes do not explicitly consider the problem of data error localization, thus only provide binary results for the storage verification. Their scheme outperforms those by integrating the correctness verification and error localization in our challenge-response protocol: the response values from servers for each challenge not only determine the correctness of the distributed storage, but also contain information to locate potential data error(s).

A.Setup phase The verifier V encrypts the file F. It then embeds sentinels in random positions in F, sentinels being randomly constructed check values. Let F denote the file F with its embedded sentinels. B.Verification phase To ensure that the archive has retained F, V specifies the positions of some sentinels in F and asks the archive to return the corresponding sentinel values. C.Security Because F is encrypted and the sentinels are randomly valued, the archive cannot feasibly distinguish a priori bettheyen sentinels and portions of the original file F. Thus they have achieved the following property: If the archive deletes or modifies a substantial, - fraction of F, it will with high probability also change roughly an fraction of sentinels. Provided that the verifier V requests and verifies enough sentinels, V can detect whether the archive has erased or altered a substantial fraction of F. (Individual sentinels are, however, only one-time verifiable.) In practice, a verifier wants to ensure against[9,10,11] change to any portion of the file F. Even a single missing or flipped bit can represent a semantically significant corruption. Thus, detection of -fraction modification alone is insufficient. With a simple improvement, however, we can ensure that even if the archive does change an -fraction (for arbitrarily large ), the verifier can still recover its file. Very simply, before planting sentinels in the file F, the user applies an error-correcting code that tolerates corruption (or erasure, if appropriate) of an -fraction of data blocks in F. The verifier also permutes the file to ensure that the symbols of the (encrypted) code are randomly dispersed, and therefore that their positions are unknown to the archive. A drawback is the preprocessing / encoding of F required prior to storage with the prover. VI. ENSURING DATA STORAGE SECURITY IN CLOUD COMPUTING

5 all returned blocks are identical, the client concludes that F is intact in that position. If it detects any inconsistencies, then it reconstructs F (using majority decoding across servers) and removes / replaces faulty servers. By sampling multiple fileblock positions, the client can boost its probability of detecting corruptions. A limitation of this approach is that the client can only feasibly inspect a small portion of F. Another is that while the client checks consistency across servers, it does not directly check integrity, i.e., that the retrieved block for position j is the one originally stored with F. Consequently, this simple approach is vulnerable to a creeping-corruption attack. The adversary picks a random position i and changes the original block value Fi to a corrupted value ^ Fi in all b servers corrupted during a given epoch. the adversary will have changed Fi to ^ Fi on a majority of servers. At that point, majority decoding will fail to reconstruct block Fi. Because the client can feasibly check only a small fraction of the file, the probability that it will detect temporary inconsistencies introduced by the adversarys corruptions is low. Thus, the adversary can escape detection and render F unretrievable with high probability in T epochs. B.Replication system with POR. To achieve better resilence against a creeping-corruption attack, we might employ a POR system (e.g., [23, 32, 6]) on each of the n servers. In a single-server POR system, F is encoded under an error-correcting code (or erasure code) that we refer to in HAIL as the server code. The server code renders each copy of F robust against a fraction c of corrupted file blocks, protecting against the single-block corruptions of their previous approach. (Here c is the error rate of the server code.) There are then two options to check the integrity of F. One is to use the single-server POR approach of embedding integrity checks within each servers copy of F. This approach, however, imposes high storage overhead: It does not take advantage of cross-server redundancy. An alternative approach is to perform integrity checks by comparing block values in a given position j using cross- server redundancy as in their previous construction. With this approach, the system is still vulnerable to a creeping- corruption attack, but much less than in the previous construction. B.Dispersal code with POR. The storage overhead of the previous approach can be improved with a more intelligent approach to creating file redundancy across servers. Rather than replicating F across servers, we can instead distribute it using an error- correcting (or erasure) code. This code is referred in HAIL as the dispersal code. In HAIL, each file block is individually distributed across the n servers under the dispersal code. With this scheme, it is possible to use cross-server redundancy to check the integrity of F. The client / verifier

D.File Retrieval and Error Recovery Since their layout of file matrix is systematic, the user can reconstruct the original file by downloading the data vectors from the first m servers, assuming that they return the correct response values. Notice that their verification scheme is based on random spot-checking, so the storage correctness assurance is a probabilistic one. On the other hand, whenever the data corruption is detected, the comparison of precomputed tokens and received response values can guarantee the identification of misbehaving server(s), again with high probability. Therefore, the user can always ask servers to send back blocks of the r rows specified in the challenge and regenerate the correct blocks by erasure correction, as long as there are at most k misbehaving servers are identified. The newly recovered blocks can then be redistributed to the misbehaving servers to maintain the correctness of storage Dynamic operations such as insert,delete,update,append are also carried out . Public verifiability has to be enforced.

VII. A HIGH AVAILABILITY AND INTEGRITY LAYER FOR CLOUD


STORAGE

Kevin et al [4,5,8,11] introduced HAIL (High- Availability and Integrity Layer), a distributed cryptographic system that allows a set of servers to prove to a client that a stored file is intact and retrievable. HAIL strengthens, formally unifies, and streamlines distinct approaches from the cryptographic and distributed-systems communities. In HAIL, a client distributes a file F with redundancy across n servers and keeps some small (constant) state locally. The goal of HAIL is to ensure resilience against a mobile adversary. This kind of powerful adversary can potentially corrupt all servers across the full system lifetime. There is one important restriction on a mobile adversary, though: It can control only b out of the n servers within any given time step. In each epoch, the client that owns F (or potentially some other entity on the clients behalf) performs a number of checks to assess the integrity of F in the system. If corruptions are detected on some servers, then F can be reconstituted from redundancy in intact servers and known faulty servers replaced. Such periodic integrity checks and remediation are an essential part of guaranteeing data availability against a mobile adversary: Without integrity checks, the adversary can corrupt all servers in turn across epochs and modify or purge F at will. B. Replication system. A first idea for HAIL is to replicate F on each of the n servers. Cross-server redundancy can be used to check integrity. To perform an integrity check, the client simply chooses a random file-block position j and retrieves the corresponding block Fj of F from each server. Provided that

6 simply checks that the blocks in a given position, i.e., row, constitute a valid codeword in the dispersal code. By means of the dispersal code, they have reduced the overall storage cost of their previous construction.

REFERENCES
[1] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, Provable data possession at untrusted stores, in Proc. of CCS07. New York, NY, USA: ACM, 2007, pp. 598-609. [2] G. Ateniese, R. D. Pietro, L. V. Mancini, andG. Tsudik, Scalable and efficient provable data possession, in Proc. Of SecureComm08. New York, NY, USA: ACM, 2008, pp.1-10. [3] K. D. Bowers, A. Juels, and A. Oprea, Proofs of retrievability: Theory and implementation, Cryptology ePrint Archive, Report 2008/175, 2008. [4] K. D. Bowers, A. Juels, and A. Oprea, Hail: A high availability and integrity layer for cloud storage, in Proc. of CCS09. Chicago, IL, USA: ACM, 2009, pp. 187-198. [5] K. D. Bowers, A. Juels, and A Oprea. HAIL: A high-availability and integrity layer for cloud storage, 2008. IACR ePrint manuscript 2008/489. [6] Y. Dodis, S. Vadhan, and D. Wichs. Proofs of retrievability via hardness amplification. In TCC, 2009. [7] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, Dynamic provable data possession, in Proc. of CCS09. Chicago, IL, USA: ACM, 2009. [8] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia.Dynamic provable data possession. In 16th ACM CCS, 2009. [9] D.L.G.Filho and P.S.L.M.Baretto. Demonstrating data possession and uncheatable data transfer. IACR ePrintarchive,2006.Report2006/150,http://eprint.iac r.org/2006/150. [10] P. Golle, S. Jarecki, and I. Mironov. Cryptographic primitives enforcing communication and storage complexity. In Financial Cryptography, pages 120- 135, 2002. [11] J. Hendricks, G. R. Ganger, and M. K. Reiter. Verifyingdistributed erasure-coded data. In 26th ACM PODC, pages139-146, 2007. [12] A. Juels and B. S. Kaliski. PORs: Proofs of retrievability for large files. In ACM CCS, pages 584-597, 2007. [13] H. Shacham and B. Waters. Compact proofs of retrievability. In ASIACRYPT, 2008. [14] C. Wang, Q. Wang, K. Ren, and W. Lou, Ensuring data storagesecurity in cloud computing, in Proc. of IWQoS09, Charleston,South Carolina, USA, 2009. [15] C. Wang, Q. Wang, K. Ren, and W. Lou, Ensuring data storagesecurity in cloud computing, in Proc. of IWQoS09, Charleston,South Carolina, USA, 2009.), Ph.D. dissertation, Dept. Elect. Eng., Harvard Univ., Cambridge, MA, 1993.

You might also like