You are on page 1of 5

Efficient Data Communication On Cloud by Secure

Auditing and Deduplication

Prof. Prashant Sadaphule, Priya Jawale, Rishabh Rapatwar, Uday Mahana, Chetan Magar
sadaphulecomdept@gmail.com, Priyajawale94@gmail.com, rishabh.rapatwar@gmail.com, Udaymahana30@gmail.com,
Chetanmagar44@gmail.com
Department of Computer Engineering, AISSMS’s IOIT, Pune

ABSTRACT proportional to the storage hardware devices and backup


Cloud computing systems have graced the internet world in a media used. Recent surveys show that a lot of data on the
way that no other technology has ever done before. They internet is redundant. Why pay to store the redundant data has
bring in a sense of comfort and a superiority that has enriched been a question of significance. A technology called data
the lives of people who have benifited from it. Many giant deduplication can severely reduce the amount of data that is
companies like Google, Amazon and Microsoft has taken this being stored on the cloud. Data deduplication is a specialized
technology to the next level by providing Google Cloud technique to eliminate the replicated copies of data. Data
platform, Amazon Web Services and Microsoft Azure deduplication is a technique that allows only a single copy of
respectively. The problem comes in with the amount of data a particular file to be stored on the cloud while discarding the
that is being uploaded to the cloud. The amount of data is replicated copies of the same file. Reviewers share their
directly proportional to the storage hardware devices and reviews about deduplication at an enterprise to be reliable and
backup media used. The simplification that the businesses and cost effective. The other problem that the cloud users face is
enterprises enjoy comes at a cost. A technology called data integrity auditing. The whole of this architecture thing allows
deduplication can severely reduce the amount of data that is the data to be stored in an unknown domain, that the client or
being stored on the cloud. Most of the data on the internet is end user is unaware of. The data is transferred over the
redundant. Storing the replicated data on the cloud only internet to the big racks of data centres of a cloud service
increases the cost for the enterprise. Data deduplication is a provider. User clients do not have any control over the data.
technology that most of the enterprises use to get rid of the This raises concerns on the integrity of data. This paper shows
redundant data. Data deduplication aims at allowing only a techniques to achieve integrity of data.
single copy of a particular file to be stored on the cloud while SecCloud and SecCloud+ are the systems that help
discarding the replicated copies of the same file. us achieve data deduplication as well as data integrity. These
This paper shows the implementation of arent standard industry terms that everyone is aware of. But at
deduplication. Alongside deduplication this paper also asserts some point at work IT professionals and enterprises indulge in
at the security of data in cloud environment. SecCloud and knowing these systems to integrate some more efficiency.
SecCloud+ are the systems that help us achieve both SecCloud generates data tags before uploading and reduces
deduplication and data integrity the overhead for user as well as auditor. Alongside providing
. deduplication, this system also provides a Proof of Ownership
protocol which gives a sense to the client that it exactly owns
Keywords the targeted file. It is an overwhelming desire of clients that
Auditing, Cloud Computing, Cloud Storage, Data the system meets its security requirements. SecCloud+ along
Deduplication, Data integrity. with the previous mentioned entities also provides security to
the system. It uses a key that cannot be seeded from the
INTRODUCTION contents of file to prevent dictionary attacks. This key is
The cloud computing technology is emerging at a speedy pace generated to save ourselves from the adverse effects of an
across the globe. Though this technology is a decade old, it attack.
has not lost it aura. Most of the things we do on the internet
are supported by cloud services. Though this realisation is
difficult to sense as most of the things inside this computer RELATED WORK
obsessed world happen behind the curtain. The music that we Ateniese et al. [2] proposed Provable Data Possession (PDP)
listen to on the internet, the videos we watch, the online schema that lets the server prove to the client that it possess
games we play, are hosted on the very same technology of this the original data without being able to retrieve it. This system
generation. This technology lets us supervise, store and significantly cuts the I/O costs. Wang et al. [3] proposed a
process the data on internet while we enjoy the awesomeness design where the cloud client does not verifies the integrity of
of this gigantic service. The cloud is not hosted on local dynamic data but a TPA ie. Third Party Auditor. This
servers but on the internet with a web of remote servers. This indulgence of TPA beholds great economies of scale. Wang et
gives us the freedom to access any application supported on al. [4] introduced Provable Data Possession in public clouds.
the cloud on our personal computer. Cloud services omits the Michael Armbrust et al. [5] describes about how cloud
fuzz of buying big racks of hardware storages, coolants to computing actually changed the industry in a way no other
cool them off, cuts the electricity cost and saves the space of technology has ever done before. Enterprises are looking
an enterprise or organization. Though this technology may more and more towards buying cloud services more than ever.
actually have a lot of star glowing features, but there’s a cost Jiawei Yuan[6] introduced two schemas POR and PDP
to enjoy this gaze, indeed. The amount of data is directly techniques to enhance data integrity in cloud. Proof of

Volume: 3 Issue: 2 April - 2018 47


Ownership (POW) lets a client sense that he actually owns the The proposed system shows the implementation of
particular file, since file is stored in a domain of data centers deduplication. It also shows auditing techniques to manage
that the client is unaware of. However using POR and PDP the storage of data. The aforementioned integrity of data is
techniques contradicts the privileges of POW. The proposed also taken care of. The proposed system gives user the
schema allows for deduplication check for both the files and confidence that he has control over the files that are being
authentication tags. Shai Halevi et al. [7] proposed a schema uploaded to the cloud. Alas, the security measures for file
of Proof of Ownership (POW) so client proves to the cloud accessing and downloading are made available through file
that it holds the file. Merkle trees and some specific codes key and secret key via OTP generation.
does all the security check and reduces the overhead on the The System consists of the following three entities
client for deduplication. N.Vidhya et al. [8] introduces a • Cloud Client (Users)
Cloud Storage Service(CSS) that manages the storage and • Auditor
maintenance. • Cloud
The client or the cloud user has to register to the system
PROPOSED SYSTEM to access his profile. Then using appropriate login credentials
The proposed system directs towards achieving data integrity the user logins to the system. The user profile gives the user
and data deduplication in the existing system. The existing the privelege to upload files, download files and share files to
system comprises of many flaws and schemas that are other users of the system. The cloud user selects the file he
inconsistent with the growing usage of cloud services. The wants to upload and uploads the file. Then it isnt now that the
clients can not feel the control of data as the data is being file is uploaded to the cloud. The file is sent to the auditor for
transferred over the internet at an unknown data centre. This verification. The auditor activates the file and then the file is
raises concerns on the integrity of data. Another thing is the uploaded to the cloud. User can download the files he has
storage of redundant copies of data is meaningless. Two uploaded. But everytime he has to enter the file key and secret
systems SecCloud and SecCloud+ help us achieve integrity of key which is sent via OTP to his registered email ID. Once the
data as well as the deduplication. user enters both the keys accurately, file is download to the
SecCloud helps us establish an environment where the user system. User can share the files he has uploaded to the cloud
can sense that he owns the file and that file is safe. The Proof with other users of the system. The user selects the file he
Of Ownership protocol takes care of that. This protocol is wants to share, selects the user among the registered cloud
established between the client and cloud. SecCloud helps us users, and sends the file. The other user then has to go through
achieve to generate data tags before uploading a file. This a security measure to download the file. The other user enters
system reduces the computational load on the user and the file key and secret key which he receives through OTP on
auditor. his registered email ID. When he enters both the key s, the file
SecCloud+ schema besides providing integrity and is downloaded to the system.
deduplication for data also provides the confidentiality of file.
It uses a key that cannot be seeded from the contents of file to
prevent dictionary attacks. This key is generated to save
ourselves from the adverse effects of an attack.

Fig 1: System architecture diagram

Volume: 3 Issue: 2 April - 2018 48


ALGORITHMS are duplicated and what not. He activates the file for
OTP generation algorithm uploading to the cloud.
Simple Mail Transfer Protocol (SMTP) is used to send and File uploading-The user logins to his profile using the
receive mail. It is a TCP/IP protocol. SMTP authenticator is credentials that he provided when registering. He selects a
used to check the username against the password for the email file he wants to upload from his local machine. The file is not
ID. The OTP is sent from an email ID that we registered on directly uploaded to the cloud but waits for auditor
the system to the recipient email. verification. When the auditor verifies and activates the file,
The algorithm basically uses the inbuilt functions from then the file is uploaded to the cloud.
packages that java provides. File downloading- In this module the user can download
In this paper we are generating OTP for security and file files that he has already uploaded to the cloud. He has to enter
sharing purpose. file key and secret key for security measures. The file key and
secret key are sent to the registered email ID of the user via
1. Declare String chars a-z, A-Z, 0-9, Special OTP. When the user enters both the keys correctly the file is
characters. downloaded to the machine.
2. Random rnd = new SecureRandom() File sharing- In this module one user can share his file
3. final int PW_LENGTH = x with another user of the cloud system. The user selects the file
4. for (int i = 0; i < PW_LENGTH; i++) he wants to share, selects the user he wants to share to, and
5. pass.append(chars.charAt(rnd.nextInt(chars.length() can share the file to another user. The receiving user has to
))) follow the file downloading protocol to download the file.
6. Generate email String from="emailid"; Secure auditing protocol- In this module, the Auditor logins to
7. String password="pass" the system using the username and password. When the user
8. RequestDispatcher tries to upload the file, the file is sent to the auditor for
rd=request.getRequestDispatcher("/EmailServlet") verification. It is when the auditor verifies that the file is
uploaded to the cloud. Auditor audits for the files that are
Hash algorithm for file level deduplication: duplicate and non duplicate. He has access to bar charts for
1. Start auditing the deduplication. Auditor has the privilege to
2. Declare variable approve files uploaded by the user, taking in consideration the
3. Initialize variable used space and duplicated files. When the auditor approves a
4. Read the file name duplicate file to be uploaded on the cloud, replicated file is not
5. Read the file name till the end of file title stored on the cloud, but only a link is generated for the user to
6. Generate NAME from strBUFF[FILENAMESIZE] access.
7. if (FirstFile) Deduplication- When a user tries to upload a file to the
8. Consider node as root element cloud, the file is sent to auditor for verification. When the
9. Inc FileCtr
auditor activates the file, then the file is uploaded to the cloud.
10. Else
A table of what files are uploaded to the cloud is maintained.
11. Search the generated FileNAME in BST
Every time a file is sent for auditor verification, the name of
12. If (Find NAME == True ) the file is checked against the already uploaded files from the
13. Compute the node
table. If the file name already persists then the file is shown to
14. Add the node to a linked list
be duplicate to the auditor. The file is non duplicate if the
15. Change the Endlink of SLL
table has no entries of the filename that the user wants to
16. Else upload.
17. Add the node in BST
18. Inc The FileCounter OTP generation- This module is implemented when a
19. Calculate Deduplication Ratio user wants to download a file. The user has to enter file key
20. Display the Result for each file iteration and secret key to download the file. File key is a random key
21. END generated at the time when user uploads a file. This key alone
will not help to download the file. User will also need the
secret key which is a combination of random characters
including special characters. These keys are sent to registered
IMPLEMENTATION MODULE email ID of the user via OTP. The file key will remain same
User Module- In this module the user registers on the every time the user wants to download a particular file, but
system with his username, password, first name, last name, the secret key will keep changing every time. This security
mobile number and email ID. If an user tries to register his measure helps save the user files from attacks.
username with an already existing username, the privilege is
denied. The user will have to register with a different
username. All the registration details are stored in SQL
EXPECTED RESULTS
database. After logging in user can upload a file, download a The implemented two modules are SecCloud and SecCloud+.
file, share a file or view shared files. Data integrity is achieved using these systems. Files can be
shared amongst registered users on the same platform which
Auditor Module- In this module the auditor logins with is secured and can be accessed only by authorising the OTP.
the username and password that are predefined by the system.
Auditor has the privileges to view the users that have been
registered on the cloud. He can audit the space used as well as
audit the deduplicated files. He has access to know what files

Volume: 3 Issue: 2 April - 2018 49


File uploading

Fig 2. Deduplication

Auditing space

Fig 1. File upload

Deduplication

Fig 3. Auditing space

CONCLUSION
This paper shows the implementation of deduplication,
techniques used for integrity of data and security measures
taken to protect a file. We have successfully implemented the
systems SecCloud and SecCloud+ to show secure auditing

Volume: 3 Issue: 2 April - 2018 50


and deduplication of data. SecCloud helps audit the integrity Member, IEEE, and Wenjing Lou, Member, IEEE,“Privacy-
of data stored in racks of storage devices at datacenters. Preserving Public Auditing For Secure Cloud Storage”
SecCloud helps us achieve a privelage to access the files with
the help of Proof of Ownership protocol so clients feel they [5] Michael Armbrust, Armando Fox, Rean Griffith, Anthony
have the control over the files they are storing at unknown D. Joseph, Randy Katz, “A view of cloud computing”, In
data centres across the globe. The usage of SecCloud system Communications of the ACM ( CACM ) Vol. 53 No. 4 April
has reduced the computational overhead for the user as well as 2010.
auditor. The computation of the overall system is increased [6] Jiawei Yuan, Shucheng Yu,“Secure and Constant cost
especially while uploading a file because SecCloud aids with public cloud storage auditing with Deduplication”, In
generating data tags before uploading. It also helps us achieve Communications and Network Security (CNS), 2013 IEEE
efficient auditing phases. SecCloud+ allows applying the Conference Oct 2013
techniques of integrity auditing and deduplication on
encrypted data for security measures. The implementation of [7] Shai Halevi , Danny Harnik , Benny Pinkas , and
generation of OTP helps us save from security threats. Alexandra Shulman-Peleg, “Proofs of Ownership in Remote
Storage Systems”, 18th ACM conference on Computer and
REFERENCES communications security oct 2011
[1] P. S. Sadaphule, R. Rapatwar, U. Mahana, P. Jawale and
C. Magar, "A Survey on Auditing Encrypted Data and [8] N.Vidhya, P.Jegathesh ,“Secure file sharing of dynamic
Deduplication in Cloud Environment", In International audit services in cloud storage”, International Journal of
Journal of Advance Engineering and Research Development, Research in Engineering and Technology Volume: 03 Issue:
Volume 4, Special Issue 5, Dec.-2017 05 May 2014

[2] Ateniese.G, R. Burns, R. Curtmola, J. Herring, L. Kissner, [9] M.Vanitha, Ar.Sivakumaran, L.Priyadharshini ,“A Study
Z.Peterson, and D. Song,“Provable Data Possession At on Secure Storage of Dynamic Audit Services in Cloud”,
Untrusted Stores”,In Proc. 14th ACM Conf. Computer and International Journal of Advanced Research in Electrical,
Comm. Security (CCS‟07), pp. 598-609, 2007 Electronics and Instrumentation Engineering Vol. 1, Issue 1
July 2012
[3] Wang.Q, C. Wang, K. Ren, W. Lou, and J. Li,“Enabling
Public Audit Ability And Data Dynamics For Storage [10] Feng Hao, Member, IEEE, Dylan Clarke, Avelino
Security In Cloud Computing”, In IEEE Trans. Parallel Francisco Zorzo ,“Deleting Secret Data with Public
Distributed Systems, vol. 22, no. 5, pp. 847-859, May 2011. Verifiability”, IEEE Transactions on Dependable and Secure
Computing ( Volume: 13, Issue: 6, Nov.-Dec. 1 2016).
[4] Yan Cong Wang, Student Member, IEEE, Sherman S.M.
Chow, Qian Wang, Student Member, IEEE, Kui Ren,

Volume: 3 Issue: 2 April - 2018 51

You might also like