Final Report Deducation SPCOE

A Dissertation Report
ON
Secure data deduplication with Role Base Access Control in

cloud computing environment
Submitted to Savitribai Phule Pune University, Pune for partial fulfillment of

the requirements for the award of the degree
MASTER OF ENGINEERING
(COMPUTER ENGINEERING)
SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE
By
Anita Dumbre
Under the Guidance of
Prof.Monika Rokade
Department of Computer Engineering,

Sharadchandra Pawar College of Engineering, Dumbarwadi,Otur.
Savitribai Phule Pune University, Pune.
(2019-2020)
SAVITRIBAI PHULE PUNE UNIVERSITY
CERTIFICATE
This is to certify that the Dissertation-II report entitled
“Secure data deduplication with Role Base Access Control in cloud
computing environment”,
Has been submitted by
Anita Dumbre Seat No:00000

in the partial fulfillment of the requirement in the degree of Master of Engineering in Computer
Engineering of Sharadchandra Pawar College of Engineering by Savitribai Phule Pune Uni-
versity,Pune during the academic year 2020-2021 under my guidance.
Prof.Monika Rokade Prof.Rokade M.D

Project Guide P. G. Co-Ordinator
SPCOE, SPCOE,
Dumbarwadi,Otur Dumbarwadi,Otur
Prof.Gholap P.S Dr. G. U. Kharat

Head of Department Principal
SPCOE, SPCOE,
Dumbarwadi,Otur Dumbarwadi,Otur

Sharadchandra Pawar College of Engineering, Dumbarwadi,Otur.
Savitribai Phule Pune University, Pune.
(2019-2020)
SAVITRIBAI PHULE PUNE UNIVERSITY,PUNE. 2020-2021
(University Certificate Page)

CERTIFICATE
This is to Certify that
Anita Dumbre
Student of M.E-II. Computer Engineering (Second Year) was Examined in

Certify that the Dissertation-I report entitled
“Secure data deduplication with Role Base Access Control in cloud

computing environment”,
On. . . . . . ./. . . . . . ./2021

At
Sharadchandra Pawar College of Engineering,
Dumbarwadi, Otur-410504
————————— —————————
Internal Examiner External Examiner
Date :
Place : Dumbarwadi,Otur
Acknowledgement
It is my privilege to acknowledge with deep sense of gratitude towards my project guide Prof.Monika
Rokade and Head of Computer Department, Prof. Gholap P.S. for his valuable suggestions and guidance
throughout course of study and timely help given in the progress of my dissertation on “Secure data
deduplication with Role Base Access Control in cloud computing environment”.
It is needed a great moment of immense satisfaction to express out profound gratitude towards our
P. G. coordinator Prof. Rokade M.D, whose real enthusiasm was a source of inspiration for my work.
My special thanks to Dr. G. U. Kharat, Principal, Sharadchandra Pawar College of Engineering for his
valuable support.
I would also like to thank all other faculty members of Computer Engineering Department who
directly or indirectly kept the enthusiasm and momentum required to keep the work towards an effective
dissertation alive in me and guided in their own capacities in all possible ways. Last but not least i would
like to thanks my family and friends for continuous support and their encouragement to award this degree.
Anita Dumbre
1
Abstract
An overview of cloud computing, cloud file services, their usability, and storage is given by this project.
Storage optimization is also considered by the de-duplication analysis of current data de-duplication tech-
niques, processes and implementations for the benefit of cloud service providers and cloud users. The
project also proposes an effective method for detecting and eliminating duplicates by measuring the di-
gest of files using file checksum algorithms, which takes less time than other methods pre-implemented.
This suggested method is to delete duplicate data, but according to that duplication search, the user has
allocated some privilege and each user has its unique token. Using the hybrid cloud model, cloud dedu-
plication is accomplished. This proposed technique is more reliable and uses less cloud resources. It
has also shown that, relative to the standard deduplication technique, the proposed scheme has limited
overhead in duplicate elimination. Content level deduplication as well as file level deduplication of file
data is reviewed through the cloud in this document
Keywords: Data deduplication, Delta compression, Storage system, Index structure, Performance eval-
uation.
2
Contents
Acknowledgment 1
List of Figures 7
List of Tables 8
Abstract 8
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Definition and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 LITERATURE REVIEW 3
3 Problem Definition and Scope 5

3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Objectives 6
4.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Dissertation Plan 7
5.1 Area of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Plan Of Dissertation Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2.1 Purpose of the document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4.2 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4.3 Eclipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.5.1 Algorithm 1: Hash Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3
5.5.2 Algorithm 2: Encryption and Decryption . . . . . . . . . . . . . . . . . . . . . 11
5.5.3 Algorithms 3: Role Based Access Control Algorithms: . . . . . . . . . . . . . . 11
5.6 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.7 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.8 Economical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.9 Operational Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.10 Time Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.11 Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.11.1 Project Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.12 Risk Mitigation, Monitoring and Management (RMMM) Plan . . . . . . . . . . . . . . 13
5.13 Dissertation Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.13.1 Dissertation Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.13.2 Installation and Configuration Task . . . . . . . . . . . . . . . . . . . . . . . . 14
6 Software Requirement Specification 15

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.1.1 Purpose and Scope of Document . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.1.2 User Classes and Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2.1 System Feature 1(Functional Requirement) . . . . . . . . . . . . . . . . . . . . 16
6.2.2 System Feature 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3 External Interface Requirements (If Any) . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.2 Hardware Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.3 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3.4 Communication Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4 Nonfunctional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.1 Performance Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.2 Safety Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.3 Security Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.4 Software Quality Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5.1 Database Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5.2 Software Requirements (Platform Choice) . . . . . . . . . . . . . . . . . . . . . 19
6.5.3 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.6 Analysis Models : SDLC Model to be applied . . . . . . . . . . . . . . . . . . . . . . . 20
6.7 System Implementation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 Detailed Design Documentation 23

7.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.2 Relevant Levels Of DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2.1 Data Flow Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4
7.2.2 DFD level 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.3 UML Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.3.1 Use-case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.3.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.3.3 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.3.4 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 TESTING 31
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.1.1 Principle of Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.2 Testing scope: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.2.1 Major Functionalities: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.3 Basics of Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.3.1 White-box testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.3.2 Black box Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.3.3 Unit testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.3.4 Integration testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.3.5 Validation testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.4 Test Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.4.1 Testing Process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.4.2 Functionality testing and non-functional testing: . . . . . . . . . . . . . . . . . 34
8.5 Test Cases and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.5.1 Test cases of system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
9 Data Tables and Discussions 36

9.1 Result Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.1.1 First Register Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.1.2 First Login Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.1.3 First User File Upload Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.1.4 First User Download File Page . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9.1.5 Owner Select File Name Share Page . . . . . . . . . . . . . . . . . . . . . . . . 40
9.1.6 Owner Select User Name Share Page . . . . . . . . . . . . . . . . . . . . . . . 41
9.1.7 Second User Register Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.1.8 Second User Login Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.1.9 File Request Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.1.10 File Request Accept Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9.1.11 File Access Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.1.12 File Enter Key Access Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
10 Conclusion And Future Work 48

10.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5
REFERENCES 49
6
List of Figures
6.1 System implementation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7.2 DFD Level 0 Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.3 DFD level 1 Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.4 Use-case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.5 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.6 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.7 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9.1 First Register Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9.2 Firs Login Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.3 First User File Upload Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.4 First User Download File Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9.5 Owner Select File Name Share Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
9.6 Owner Select File Name Share Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
9.7 Second User Register Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.8 Second User Login Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.9 File Request Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.10 File Request Accept Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9.11 File Access Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.12 File Enter Key Access Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7
List of Tables

5.2 Risk Probability definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Risk Impact definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8
CHAPTER 1
Introduction
1.1 Overview
Data deduplication technology usually identifies redundant data quickly and correctly by using file check-
sum technique. A checksum can determine whether there is redundant data. However, there are the
presences of false positives. In order to avoid false positives, we need to compare a new chunk with
chunks of data that have been stored. In order to reduce the time to exclude the false positives, cur-
rent research uses extraction of file data checksum. As a result of these ”data delegations”, managing
storage and reducing its cost can be one of the most difficult and important tasks in a mass storage sys-
tem. Data duplication is an efficient data reduction approach that not only reduces storage space by
removing duplicate data but also reduces the transmission of redundant data in low-bandwidth network
environments. Data duplication has become increasingly popular in recent years as a highly efficient
data reduction tool. Cloud computing is an evolving trend in information and communication technol-
ogy for the modern century. In order to reduce the time to exclude the false positives, current research
uses extraction of file data checksum. However, the target file stores multiple attributes such as user id,
filename, size, extension, and checksum and date-time table. Whenever user uploads a particular file,
the system then first calculates the checksum and that checksum is cross verified with the checksum data
stored in database. If the file already exists, then it will update the entry else it will make a new entry
into the database. Data owners (owners), the cloud server (server) and data consumers (users). Cloud
computing is a term, which involves virtualization, distributed computing, networking, and software and
web services. A cloud consists of several elements such as clients, datacenter and distributed servers. It
includes fault tolerance, high availability, scalability, flexibility, reduced overhead for users, reduced cost
of ownership, on demand services etc
1.2 Problem Definition and Objective
1.2.1 Problem Definition

To present an advanced scheme supporting stronger security along with de-duplication mainly to reduce
storage space and enhance data confidentiality and integrity
Secure data de-duplication with Role Base Access Control in Cloud Computing Environment
1.2.2 Objectives
The objectives of this research work include the following.
• To increase the storage utilization and reduce network bandwidth for cloud storage providers
• To remove the duplicate copies of data and improve the reliability.
• To develop algorithms of Role Base Access Control (RBAC) for secure data access.
• To enhance data security and achieve data confidentiality
1.3 Motivation
• In existing system a user can be a Data Owner and a Data Consumer simultaneously.
• Authorities are assumed to have powerful computation abilities, and they are supervised by gov-
ernment offices because some attributes partially contain users’ personally identifiable informa-
tion.
• The whole attribute set is divided into N disjoint sets and controlled by each authority, therefore
each authority is aware of only part of attributes.
• A Data Owner is the entity who wishes to outsource encrypted data file to the Cloud Servers.
• The Cloud Server, who is assumed to have adequate storage capacity, does nothing but store them.
• Newly joined Data Consumers request private keys from all of the authorities, and they do not
know which attributes are controlled by which authorities
Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 2

CHAPTER 2
LITERATURE REVIEW
According to Kaiping Xue [1] propose a novel heterogeneous framework to remove the problem of
single-point performance bottleneck and provide a more efficient access control scheme with an auditing
mechanism. Our framework employs multiple attribute authorities to share the load of user legitimacy
verification. Meanwhile, in our scheme, a CA (Central Authority) is introduced to generate secret keys
for legitimacy verified users. Unlike other multiauthority access control schemes, each of the authorities
in our scheme manages the whole attribute set individually. To enhance security, we also propose an
auditing mechanism to detect which AA (Attribute Authority) has incorrectly or maliciously performed
the legitimacy verification procedure.
Kan Yang and et. Al.[2], proposed a revocable multi-authority CP-ABE scheme, and apply it as the
underlying techniques to design the data access control scheme. Our attribute revocation method can
efficiently achieve both forward security and backward security. System also design an expressive, ef-
ficient and revocable data access control scheme for multi-authority cloud storage systems, where there
are multiple authorities co-exist and each authority is able to issue attributes independently.
The system [3] proposed a secure way for anti-collusion key distribution without any secure third party
channels, and the users can securely get their private keys from group owner. Second, this method can
propose fine grained access control, any user in the group can use the source in the cloud and revoked
users cannot access the cloud all over again after they are revoked. Thirdly, system can shield the scheme
from collusion attack that means that revoked users cannot get the actual data file even if they combine
with the untrusted cloud. In this approach, by exploit polynomial capability, framework can complete a
safe client negation conspire, finally, this plan can accomplish fine efficiency, which implies past clients
necessitate not to refresh their revoked from the group.
According to [4] proposes the major of key-approach feature which is based on KP-ABE with reaction
of non-monotonic access structures and with regular cipher text size. System also proposes the first Key-
Policy Attribute-based Encryption (KPABE) method allowing for non-granted access structures (i.e., that
may contain negated attributes) and with constant cipher text size. Towards achieving this goal, system
first show that a certain class of identity based broadcast encryption schemes generically yields mono-
tonic KPABE systems in the selective set model. System then describes a new efficient identity-based
revocation mechanism that, when combined with a particular instantiation of our general monotonic con-
struction, gives rise to the first truly expressive KP-ABE realization with constant-size cipher text.
According to F. Zhang and K. Kim [5] proposed an ID-based ring signature approach, both approaches
has defined base on bilinear pairings as well as Java pairing library. Also system analyzes their security
and efficiency with different existing strategies. The Java Pairing library (JPBC) has used for data en-
cryption and decryption purpose. Some user access control policies has design for end users that also
enhance the privacy and anonymity of data owner.
In approach [6], propose the first Identity-based threshold ring signature approach that does not support
to java pairings. It proposes the first Identity -based threshold verifiable ring signature strategy. System
also analyze that the secrecy of the actual signers is maintained even against the PK generator (PKG) of
the Identity –based system. Finally system shows how to add identity collusion and other existing base
different schemes. Due to the dissimilar levels of signer inscrutability they support, the system proposed
in this paper actually form a suite of Identity based thresh-old ring signature method which is related to
many real-world systems with varied anonymity needs.
In [7], system first validates the security requirements of whole architecture, and after that adds to in the
security architecture. System proposed AES 128 16 bit encryption approach for end to end user verifica-
tion and data encryption/ decryption purpose.
According to Kan Yan [8], System proposed Cipher text-Policy Attribute-based Encryption (CP-ABE)
is a promising technique for access control of encrypted data. It requires a trusted authority manages all
the attributes and distributes keys in the system. In cloud storage systems, there are multiple authori-
ties co-existing and each authority is able to issue attributes independently. However, existing CP-ABE
schemes cannot be directly applied to data access control for multi-authority cloud storage systems, due
to the inefficiency of decryption and revocation. In this paper, system proposes DAC-MACS (Data Ac-
cess Control for Multi-Authority Cloud Storage), an effective and secure data access control scheme
with efficient decryption and revocation. Specifically, system construct a new multi-authority CP-ABE
scheme with efficient decryption and also design an efficient attribute revocation method that can achieve
both forward security and backward security.
The system [9] proposed CaCo, an efficient Cauchy coding approach for data storage in the cloud. First,
CaCo uses Cauchy matrix heuristics to produce a matrix set. Second, for each matrix in this set, CaCo
uses XOR schedule heuristics to generate a series of schedules. In second phase CaCo selects the shortest
one from all the produced schedules. In such a way, CaCo has the ability to identify an optimal coding
scheme, within the capability of the current state of the art, for an arbitrary given redundancy configu-
ration. It also implements CaCo in the Cloud distributed file system and evaluates its performance by
comparing with ”Cloud 2.5”. Finally author proposed this system enhance the security in distributed file
system with effective data storage scheme.
Ibrahim Adel [10] defines a new replica placement policy for HDFS. The issue of load balancing is
addressed in this work by evenly distributing replicas to cluster nodes. Therefore, there is no more need
for any load balancing utility. IDPM can generate replica distributions that are perfectly even and satisfy
all HDFS replica placement rules as confirmed by the simulation results. There is an exciting future work
for the proposed policy. HDFS replica placement policy the replicas of data blocks cannot be evenly dis-
tribute across cluster nodes, so the current HDFS has to rely on load balancing utility to balance replica
distributions which results in more time and resources consuming. These challenges drive the need for
intelligent methods that solve the data placement problem to achieve high performance without the need
for load balancing utility

CHAPTER 3
Problem Definition and Scope
3.1 Problem Definition

To present an advanced scheme supporting stronger security along with de-duplication mainly to reduce
storage space and enhance data confidentiality and integrity
3.2 Scope
• The research work focus on cloud data storage security, which has always been a most aspect of
quality of service.
• For ensuring the correctness of cloud clients data in the cloud, in this paper propose a highly
effective and flexible distributed scheme with two features, apposing to its predecessors
CHAPTER 4
Objectives
4.1 Objectives
• To increase the storage utilization and reduce network bandwidth for cloud storage providers
• To remove the duplicate copies of data and improve the reliability.
• To develop algorithms of Role Base Access Control (RBAC) for secure data access.
• To enhance data security and achieve data confidentiality

CHAPTER 5
Dissertation Plan
5.1 Area of Dissertation

Security Base Application
5.2 Plan Of Dissertation Execution
5.2.1 Purpose of the document

Dissertation plan calculate the duration needed for work completion. It helps to keep the track of the
progress of the dissertation and identify the priorities and dependencies. It also manages our task within
deadline and help to identify critical tasks. Risks to the system are also identified if proper plan is de-
veloped. Dissertation Plan contains the dissertation estimation, risk management, dissertation schedule,
tracking and control mechanism.
Table 5.1: System Implementation Plan
Sr. Task Name Begin date End date Remarks

No
1 Selecting project domain 15 July 2020 20 July 2020 Done
2 Understanding project need 21 July 2020 25 July 2020 Done
3 Understanding pre requisites 26 July 2020 30 July 2020 Done
4 Information Gathering 1 Aug 2020 30 Aug 2020 Done
5 Literature Survey 1 Sept 2020 15 Sept 2020 Done
6 Refine Project Scope 16 Sept 2020 18 Sept 2020 Done
7 Concept Development 19 Sept 2020 20 Sept 2020 Done
8 Planning and Scheduling 21 Sept 2020 23 Sept 2020 Done
9 Requirements analysis 24 Sept 2020 25 Sept 2020 Done
10 Risk identification and moni- 26 Sept 2020 27 Sept 2020 Done
toring
11 Design and modeling 28 Sept 2020 15 Oct 2020 Done
12 Design review and refinement 16 Oct 2020 20 Oct 2020 Done
13 GUI design 21 Oct 2020 20 Nov 2020 Done
14 Implementation 21 Nov 2020 15 Feb 2020 Done
15 Review and suggestions for 15 Mar 2021 20 Mar 2021 Done
Implementation
16 Outcome assessment 21 Mar 2021 30 Mar 2021
17 Testing and Quality Assur- 1 Apr 2021 10 Apr 2021
ance
18 Review and suggestions for 11 Apr 2021 15 Apr 2021
Testing and QA
19 Refined QA activities 16 Apr 2021 30 May 2021
5.3 Proposed Methodology

Let S be a system which do analysis and read documents; such that S = {S1, S2, S3,S4} where S1 repre-
sents the document uploading phase with encryption. S2 represents with SHA-256. S3 represents User
authentication and Module.S4 represents query result with multiple comparison graphs.
Case 1: User first collects the file data
Let S1 be a set of parameters of searching query
S1= {encrypt the file using PBEWithMD5AndDES encryption algorithm} Where,
User first upload the document file which consist some text data and it will encrypt using PBEWithMD5AndDES.
Condition:
If File! = Null
Operation: f1: Proceed ()
Else
Operation: Invalid file
Success: Valid then proceed.

Failure: Discard the query

Case 2: Data collect by CSP
Let’s S2 be SHA score calculation
S2=system apply SHA-256 on encrypted data
Where,
Condition:
If (S2 executed successfully)
Operation: F2: Proceed ()
Else
Operation: Error
Success: Process success.
Failure: Discard the Query Execute.
Case 3: Let S3 user for end user verification purpose.
S3 :{ True: False}
Where,
Users==Valid then access can share data from data owner.
Condition: If (user= valid user)
Operation: Forward to cloud server
Condition: Else (user key-data == PPT key-data)
Condition: Then access dashboard
Condition: Else
Operation: Invalid user.
Success: Given user is authenticated then proceed
Failure: System displays the message as invalid user.
Case 4: Let S4 provide the system analysis with existing approaches.
G= {A1, A2, An} it will provide all analysis graphs.
Condition
If (Graphs values= valid)
Operation: Show graphs
Condition: Else
Operation: Invalid user.
Success: Show graphs
Failure: Error to loading graphs

5.4 Implementation Details
5.4.1 Java
Object Oriented: In Java, everything is an Object. Java can be easily extended since it is based on the
Object model. Platform Independent: Unlike many other programming languages including C and C++,
when Java is compiled, it is not compiled into platform specific machine, rather into platform independent
byte code. This byte code is distributed over the web and interpreted by the Virtual Machine (JVM) on
whichever platform it is being run on. Simple: Java is designed to be easy to learn. If you understand the
basic concept of OOP Java, it would be easy to master. Secure: With Java’s secure feature it enables to
develop virus-free, tamper-free systems. Authentication techniques are based on public-key encryption.
Architecture-neutral: Java compiler generates an architecture neutral object file format, which makes
the compiled code executable on many processors, with the presence of Java runtime system. Portable:
Being architecture-neutral and having no implementation dependent aspects of the specification makes
Java portable. Compiler in Java is written in ANSI C with a clean portability boundary, which is a
POSIX subset. Robust: Java makes an effort to eliminate error prone situations by emphasizing mainly
on compile time error checking and runtime checking. Multithreaded: With Java’s multithreaded feature
it is possible to write programs that can perform many tasks simultaneously. This design feature allows
the developers to construct interactive applications that can run smoothly. Interpreted: Java byte code
is translated on the fly to native machine instructions and is not stored anywhere. The development
process is more rapid and analytical since the linking is an incremental and light-weight process. High
Performance: With the use of Just-In-Time compilers, Java enables high performance. Distributed: Java
is designed for the distributed environment of the internet. Dynamic: Java is considered to be more
dynamic than C or C++ since it is designed to adapt to an evolving environment. Java programs can
carry extensive amount of run-time information that can be used to verify and resolve accesses to objects
on run-time.
5.4.2 MySQL
MySQL is an open-source relational database management system (RDBMS). The MySQL™ software
delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database
server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for em-
bedding into mass-deployed software. MySQL is under two different editions: the open source MySQL
Community Server and the proprietary Enterprise Server. MySQL Enterprise Server is differentiated
by a series of proprietary extensions which install as server plugins, but otherwise shares the version
numbering system and is built from the same code base.
5.4.3 Eclipse
There are many ways to learn how to program in Java. The most developers believes that there are
advantages to learning Java using the Eclipse integrated development environment (IDE). Some of these
are listed below:

• Eclipse provides a number of aids that make writing Java code much quicker and easier than using
a text editor. This means that you can spend more time learning Java, and less time typing and
looking up documentation.
• The Eclipse debugger and scrapbook allow you to look inside the execution of the Java code. This
allows you to “see” objects and to understand how Java is working behind the scenes
• Eclipse provides full support for agile software development practices such as test-driven devel-
opment and refactoring. This allows you to learn these practices as you learn Java.
• If you plan to do software development in Java, you’ll need to learn Eclipse or some other IDE.
• So learning Eclipse from the start will save you time and effort.
• The chief concern with learning Java with an IDE is that learning the IDE itself will be difficult
and will distract you from learning Java. It is hoped that this tutorial will make learning the basics
of Eclipse relatively painless so you can focus on learning Java
5.5 Algorithm
5.5.1 Algorithm 1: Hash Generation

Input: data d,
Output: Generated hash H according to given data
Step 1: Input data as d
Step 2: Apply SHA 256 from SHA family
Step 3: CurrentHash= SHA256 (d)
Step 4: Return CurrentHash
5.5.2 Algorithm 2: Encryption and Decryption

KeyGen (M) → K is the key generation algorithm that maps a data copy M to a convergent key K.
Enc (K, M) → C is the PBEWithMD5AndDES encryption algorithm that takes both the convergent key
K and the data copy M as inputs and then outputs a cipher text C;
Dec (K, C) → M is the decryption algorithm that takes both the cipher text C and the convergent key K
as input and then outputs the original data copy M.
5.5.3 Algorithms 3: Role Based Access Control Algorithms:

Input: Attribute Email-ID, File Data and File key-data.
Output: Rule set as policies or signatures.
Step 1: Initialize the data string S [].
Step 2: initialize a=0, k=0, User Email-ID
Step 3: Read Filedata and filekey

a← filekey list [i . . . . . . n]
k← Email-ID List [i . . . . . . n]
Step 3: for each (read a to S)
If (key-data. Equals (a) && User Email-ID.Equals (k))
Then User File Share information show
Else
Then User File Not Share information show
End for
5.6 Feasibility Study

The feasibility study is major factor which contributes to analysis of system. In earlier stages of S/W
development, it is necessary to check weather system is feasible or not. Detail study was carried out
to check workability of proposed system, so the feasibility study is system proposal regarding to its
workability, impact on organization, ability to meet user requirements and effective use of resources
5.7 Technical Feasibility

In this section, we will learn how to identify the hardware and software, physical location, technology,
and other important logistics of how our project will run. The Technical Feasibility Study assesses the
details of how we will deliver a product or service (i.e., survey, where our project will be located or
feasible, the technology needed, etc.). Think of the technical feasibility study as the logistical or tactical
plan of how our project will develop, deliver, use and track its products or services. A technical feasibility
study is an excellent tool for troubleshooting and long-term planning. In some regards, it serves as a flow
chart of how your products and services evolve and move through our project to physically reach our
market.
5.8 Economical Feasibility

The purpose of an economic feasibility study (EFS) is to demonstrate the net benefit of a proposed
project for accepting or disbursing electronic benefits, taking into consideration the benefits and costs to
the agency, other state agencies, and the general public as a whole...... Cost Benefit Analysis. Economic
feasibility analysis is the most commonly used method for determining the efficiency of a new project. It
is also known as cost analysis. It helps in identifying profit against investment expected from a project.
Cost and time are the most essential factors involved in this field of study.
5.9 Operational Feasibility

Operational feasibility study tests the operational scope of the software to be developed. The proposed
system must have high operational feasibility i.e prediction system. The usability will be high. Oper-

ational feasibility assesses the extent to which the required web application performs a series of steps
to solve problems and user requirements. This feasibility is dependent on human resources (software
development team) and involves visualizing whether the software will operate after it is developed and
be operative once it is installed.
5.10 Time Feasibility

Similar to economic feasibility, a rough estimate of the project schedule is required to determine if it
would be feasible to complete the systems project within a required timeframe
5.11 Risk Management
5.11.1 Project Risks

Risk analysis and management are a series of steps that help a software team to understand and manage
uncertainty. Many problems can plague a software project. A risk is a potential problem—it might
happen, it might not. But, regardless of the outcome, it’s a really good idea to identify it, assess its
probability of occurrence, estimate its impact, and establish a contingency plan should the problem
actually occur. The risks for the Project can be analyzed within the constraints of time and quality
ID Risk Description Probability Impact

1 Web attack High Low High High
2 Application attacks High Low High High
3 DB attack High Low Low Low
5.12 Risk Mitigation, Monitoring and Management (RMMM)

Plan
Table 5.2: Risk Probability definitions
Probability Value Description

High Probability of occurrence is ≥75%
Medium Probability of occurrence is 26-75%
Low Probability of occurrence is ≤25%

Table 5.3: Risk Impact definitions
Impact Value Description

High ≥10% ≥75%
High 5-10% 26-75%
Medium ≤5% ≤25%
5.13 Dissertation Schedule
5.13.1 Dissertation Task

1. Installations required for dissertation
2. Preliminary Design of a System.
3. Compilation of Assignments
4. Writing a required Code of system.
5. Testing of a system.
6. Taking results of a system.
7. Comparing results of system with other systems.
5.13.2 Installation and Configuration Task

Major Tasks in the Dissertation stages are:
Task 1:Download and install jdk 1.8.0 or higher and Mysql for OS configuration
Task 2:Download and install Eclipse which is compatible jdk 1.8
Task 3:Import Java project Import the source code from the source code directory and MySQL database
script run
Task 4:Set up Library files required to project
Task 5:First go to project, right click on project in run as, select Run on server.
Task 6:Finally run Java Program

CHAPTER 6
Software Requirement Specification
6.1 Introduction
A Software requirements specification document describes the intended purpose, requirements and na-
ture of software to be developed. It also includes the yield and cost of the software. A Software Re-
quirements Specification (SRS) is describes the nature of a Software , software or application. In simple
words, SRS is a manual of a Software provided it is prepared before you kick-start a Software /applica-
tion. A software document is primarily prepared for a Software , software or any kind of application..
6.1.1 Purpose and Scope of Document

• The research work focus on cloud data storage security, which has always been a most aspect of
quality of service.
• For ensuring the correctness of cloud clients data in the cloud, in this paper propose a highly
effective and flexible distributed scheme with two features, apposing to its predecessors.
• Members can upload file, access it as when needed and use the files shared by data owner, take
organization back up by means of admin. It will provide authentication to individual file uploaded
over cloud irrespective of operating system and device, which will increase the security of backup.
• The proposed system is improved with multi-factor authentication and better combinations for
data encryption, PBEWithMD5AndDES Algorithm and Secure Hash Algorithm achieves the re-
quired security goals
6.1.2 User Classes and Characteristics

End User: The user who will operate the system will be known as end user. The user interacting with
system should be able to understand the operation of the system.
Technical User: Any technical user will be able to operate on the project. It will be easy for the user to
interact with the system.
Non-Technical User: A non-technical user will also be able to operate on the system as the GUI will be
designed in such a way that it will get easy for the user to interact with the system. Certain documentation
will also be provided so that it will get easy for the user to understand the working of the system
6.2 Functional Requirements

• System must be fast and efficient
• User friendly GUI
• Reusability
• Performance
• System Validation input
• Proper output
6.2.1 System Feature 1(Functional Requirement)

Registration and Authentication: In that phase all entities can register. Data owner, Multiple Admin
and user can create own profile.
Data Uploading: In first phase once data owner upload the file. In that module data encryption done
using PBEWithMD5AndDES and SHA256 encryption scheme and the same time keys send to EC2
Cloud. Data Owner will upload file for his backup and can allow the file to be accessed by friend user.
He will be able to access the uploaded file by entering proper credentials sent to his registered email id.
Files uploaded by user will get scan by algorithm and if similar contents are there in old and current
new file then previous file will get stored. Common files in which all contents are exactly similar if gets
uploaded on the system by multiple users will get registered as the first owners file and others can access
it as friend user which will avoid duplication of file at server.
Data Sharing: In that phase data sharing done by data owner, he can any file to any user in cloud group.
Friend user can access the shared file to him by data owner by following the login process and proper
credentials send to him for that particular file.
Access Control and revocation: In access control any user can view or access the file shared by user to
him. In revocation data owner can revoke the file access to specific user. The common files uploaded by
multiple users can’t be deleted will get deleted according to maximum requirement time specified by out
of multiple users.
File request and download: user can give the download request to cloud server, at the same Data Owner
at verification has done.
Cloud Storage Service Provider (CSP)
Cloud Storage Services Provider provides database. It allows data owner to keep any kind of information.
CSP also allows to the user to make the user defined database schema. According to user requirement
the space for the user instance will be allocated by CSP

6.2.2 System Feature 2

Forward secure identity predicated ring signature for data sharing in the cloud architecture that proposed
data sharing in an efficient manner. This architecture, provide multiple cloud environment for astronomi-
cally immense data sharing in secure way. The servers may exist in different physical locations. The CSP
takes decision of the servers to store the data depending upon available spaces. Identity predicated ring
signature provide the ring formation of users. The authentic data sharing in multiple clouds to provide
secure data sharing at sizably voluminous system. The encryption and decryption provide secure data
transmission. Storage-as-a-service accessible by cloud service providers (CSPs) is rewarded capability
that allows administrations to delegate their delicate data to be stored on unreachable servers. This paper
suggests a cloud based storage technique that agrees the data administrator to advantage from the accessi-
bility’s obtainable by the CSP and permits trust among them. Identity based (ID-based) ring sign, which
excludes procedure of credential confirmation, can be used as a additional. In this paper system execute
the lightweight data sharing approach in cloud environment. Initially it creates multiple objects layers
like data owner, CSP and cloud database storage. In the execution phase we show the how data share into
different layer with lightweight data sharing mahcnicsam, which can reduce the internal overhead. The
system can also provide data transmission efficiency between different nodes. The Authentication mahc-
nicsam has executed base on Role Base Access Control RBAC. In experimental analyses also achieve
the system can efficiently works in trusted as well as semi trusted environment of cloud
6.3 External Interface Requirements (If Any)
6.3.1 User Interfaces

The Interface will be in the form of an application. It is designed to be functional and minimal in its
styling. All options will be displayed in a menu based format. Web app will be used to setup the page
layout and add minimal styling to make the interface user friendly.
6.3.2 Hardware Interfaces

The hardware should have following specifications:
• Ability to read gallery
• Ability to exchange data over network
• Touch screen for convenience
• Keypad (in case touchpad not available)
• Continuous power supply
• Ability to connect to network
• Ability to take input from user

• Ability to validate user
6.3.3 Software Interfaces

The server will be hosted using Apache Tomcat Webserver (Version 8.0.14). It will also have a MySQL
relational database. The main backend processing will be done using Java Server Pages (JSP) including
connecting to and accessing the database and processing requests.
6.3.4 Communication Interfaces

The main communication protocol will be Hyper Text Transfer Protocol (HTTP). This will be used to
transfer information back and forth from the client to the server. HTTP GET and POST will be used to
send the information.
6.4 Nonfunctional Requirements
6.4.1 Performance Requirement

Connecting model between the input file and output file should be strong enough to gain better perfor-
mance. The method defined to meet with the performance is very clear and straight. With pout any
design consideration it will be difficult to specify the performance criteria . But if developer did not
define a particular criteria the it is like to loose the performance of the system.The criteria which are
defined to meet the performance are as follows:
• Response Time: Its very quick as an interactive system response to user input. In terms of the
web, definitions vary. Great response time from the moment the user submits the request for a
page to the moment where the page begins to render.
• Workload Work load depends on the server configuration. Its very efficient for bulk load as well
• Scalability It can be scalable as per the requirement and hardware configuration will play vital
role in this
• Platform Java is the platform, IDS concept is the heart of this system.
6.4.2 Safety Requirement

Back up of all the software developed till date need to be taken every now and then to be on safer side.the
hardware components needed should always be put off after use.
6.4.3 Security Requirement

We provide authentication and authorization by passwords for each level of access. This system is
designed in small modules so that errors can find out easily.

6.4.4 Software Quality Attributes

To maintain the software quality attributes such as performance, security,safety of the Software is very
essential. These attributes should meet the requirements with satisfactory result. Product is portable; it
can run between two connected systems or a large Network of computers. Product is maintainable i.e. in
future the properties of the product can be changed to meet the requirements.
6.5 System Requirements

• Processor :- Intel Pentium 4 or above
• Memory :- 2 GB or above
• Other peripheral :- Printer
• Hard Disk :- 500gb
6.5.1 Database Requirements

• It should be MySQL database on platform.
• Database must be integrated with key constraints
• It should be maintain the relational base on RDMS and normalization
• System will create database backup on periodic basis.
• It will execute all commands like DML, DDL and DCL as well as we required some security
measurements for sql injection.
6.5.2 Software Requirements (Platform Choice)

Technologies and tools used in Policy Software are as follows Technology used:
Front End
• JSP,Html
• Internet Explorer 6.0/above
• Tool : Eclipse or net beans
• java jdk 1.8
Back-End
• MySQL

6.5.3 Hardware Requirements

• Processor :- Intel Pentium 4 or above
• Memory :- 2 GB or above
• Other peripheral :- Printer
• Hard Disk :- 500gb
6.6 Analysis Models : SDLC Model to be applied

Agile SDLC model is a combination of iterative and incremental process models with focus on process
adaptability and customer satisfaction by rapid delivery of working software product. Agile Methods
break the product into small incremental builds. These builds are provided in iterations. Each iteration
typically lasts from about one to three weeks. Every iteration involves cross functional teams working
simultaneously on various areas like
Planning
Requirements Analysis
Design
Coding
Unit Testing and
Acceptance Testing.
At the end of the iteration, a working product is displayed to the customer and important stakeholders.
6.7 System Implementation Plan

Figure 6.1: System implementation Plan

Table 6.1: System Implementation Plan
Sr. Task Name Begin date End date Remarks

No
1 Selecting project domain 15 July 2020 20 July 2020 Done
2 Understanding project need 21 July 2020 25 July 2020 Done
3 Understanding pre requisites 26 July 2020 30 July 2020 Done
4 Information Gathering 1 Aug 2020 30 Aug 2020 Done
5 Literature Survey 1 Sept 2020 15 Sept 2020 Done
6 Refine Project Scope 16 Sept 2020 18 Sept 2020 Done
7 Concept Development 19 Sept 2020 20 Sept 2020 Done
8 Planning and Scheduling 21 Sept 2020 23 Sept 2020 Done
9 Requirements analysis 24 Sept 2020 25 Sept 2020 Done
10 Risk identification and moni- 26 Sept 2020 27 Sept 2020 Done
toring
11 Design and modeling 28 Sept 2020 15 Oct 2020 Done
12 Design review and refinement 16 Oct 2020 20 Oct 2020 Done
13 GUI design 21 Oct 2020 20 Nov 2020 Done
14 Implementation 21 Nov 2020 15 Feb 2020 Done
15 Review and suggestions for 15 Mar 2021 20 Mar 2021 Done
Implementation
16 Outcome assessment 21 Mar 2021 30 Mar 2021
17 Testing and Quality Assur- 1 Apr 2021 10 Apr 2021
ance
18 Review and suggestions for 11 Apr 2021 15 Apr 2021
Testing and QA
19 Refined QA activities 16 Apr 2021 30 May 2021

CHAPTER 7
Detailed Design Documentation
7.1 System Architecture

For secure de-duplication, system is proposed to provide efficient de-duplication with high reliability
for file-level and block-level de-duplication, respectively. In our system, when user tries to upload file,
first file level duplication check is performed. If file is duplicate it will be rejected by storage server
and this check saves the space equal to file length. If file is not duplicate then file is divided into blocks
of fixed size. Specifically, data are split into fragments by using secure secret sharing schemes and
stored at different nodes. Before uploading this blocks block level duplication is performed. If the
blocks are duplicate then these blocks are not uploaded to the server. This saves the amount of space
equivalent to avoided duplicate blocks. The security will be analyzed in terms of two aspects, that is, the
authorization of duplicate check and the confidentiality of data. To construct the secure de-duplication
scheme convergent encryption, symmetric encryption scheme, and the POW scheme is used. We achieve
data confidentiality by encrypting data before outsourcing to the storage server. To achieve more security
we are storing the file blocks at different nodes using T-coloring technique
Figure 7.1: System Architecture
Implementation Procedure
Registration and Authentication: In that phase all entities can register. Data owner, Multiple Admin
and user can create own profile.
Data Uploading: In first phase once data owner upload the file. In that module data encryption done
using PBEWithMD5AndDES and SHA256 encryption scheme and the same time keys send to EC2
Cloud. Data Owner will upload file for his backup and can allow the file to be accessed by friend user.
He will be able to access the uploaded file by entering proper credentials sent to his registered email id.
Files uploaded by user will get scan by algorithm and if similar contents are there in old and current
new file then previous file will get stored. Common files in which all contents are exactly similar if gets
uploaded on the system by multiple users will get registered as the first owners file and others can access
it as friend user which will avoid duplication of file at server.
Data Sharing: In that phase data sharing done by data owner, he can any file to any user in cloud group.
Friend user can access the shared file to him by data owner by following the login process and proper
credentials send to him for that particular file.
Access Control and revocation: In access control any user can view or access the file shared by user to
him. In revocation data owner can revoke the file access to specific user. The common files uploaded by
multiple users can’t be deleted will get deleted according to maximum requirement time specified by out
of multiple users.
File request and download: user can give the download request to cloud server, at the same Data Owner
at verification has done.

Cloud Storage Service Provider (CSP): Cloud Storage Services Provider provides database. It allows
data owner to keep any kind of information. CSP also allows to the user to make the user defined database
schema. According to user requirement the space for the user instance will be allocated by CSP
7.2 Relevant Levels Of DFD
7.2.1 Data Flow Diagrams

A data flow diagram (DFD) is a graphical representation of the flow of data through an information
system, modeling its process aspects. Often they are a preliminary step used to create an overview of
the system which can later be elaborated. DFDs can also be used for the visualization of data processing
(structured design).
Figure 7.2: DFD Level 0 Diagram
7.2.2 DFD level 1

A DFD shows what kinds of information will be input to and output from the system, where the data will
come from and go to, and where the data will be stored. It does not show information about the timing
of processes, or information about whether processes will operate in sequence or in parallel (which is
shown on a flowchart).

Figure 7.3: DFD level 1 Diagram
7.3 UML Diagram
7.3.1 Use-case Diagram

A use case diagram at its simplest is a representation of a users interaction with the system and depicting
the specifications of a use case. A use case diagram can portray the different types of users of a sys-
tem and the various ways that they interact with the system. This type of diagram is typically used in
conjunction with the textual use case and will often be accompanied by other types of diagrams as well.

Figure 7.4: Use-case Diagram
7.3.2 Activity Diagram

Activity diagrams are graphical representations of workflows of step wise activities and actions with
support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams are
intended to model both computational and organizational processes (i.e. workflows). Activity diagrams
show the overall flow of control. Activity diagrams are constructed from a limited number of shapes,
connected with arrows. The most important shape types :

Figure 7.5: Activity Diagram
7.3.3 Sequence Diagram

A sequence diagram is a kind of interaction diagram that shows how processes operate with one another
and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object
interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and
the sequence of messages exchanged between the objects needed to carry out the functionality of the
scenario. Sequence diagrams are typically associated with use case realizations in the Logical View of
the system under development. Sequence diagram for the system is given in the figure.

Figure 7.6: Sequence Diagram
7.3.4 Class Diagram

It is type of static structure diagram that describes the structure of a system by showing the systems
classes, their attributes, operations (or methods) and relationships among objects.

Figure 7.7: Class Diagram

CHAPTER 8
TESTING
8.1 Introduction
Testing is very important phase of the software development life cycle. The purpose of this phase is to
check lifetime system. This is compulsory phase. Information given in this section gives the details for
the testing activities that should be approved for the “opinion mining” of scheme. Tester has to estimate
test of each component and write downs test cases according to user requirement and system structure.
8.1.1 Principle of Testing:

• To know the system performance.
• To recognize the functionality of each and every individual module.
• To verify whether system is functioning as per the user requirements.
8.2 Testing scope:

Testing is important in software engineering to validate, verify project. Main aim of the testing is to
verify the system by finding bugs or defect in the different modules. Due to testing phase, system is
analyzed for probable risks in project.
8.2.1 Major Functionalities:

Following are the some main functionalities of this project.
• User registration
• Data uploading
• Data store in DB
Access by GUI
• Implementation various machine learning algorithm
• Classification results
8.3 Basics of Software Testing
8.3.1 White-box testing:

The White box testing is done by the tester who has knowledge o f the programming language. White
box testing done on algorithm or source code of the project. It is the procedure of giving the input to
the project and verifying that how system process input to produce result. In white box testing all, the
interior details are required to known to tester. White box testing is also called as transparent testing.
This test needs code to check so it is essential for tester to have the knowledge coding. Following are the
techniques of White Box testing:
• Programming style
• Control method
• Source language
• Database design
This type of a test is useful to beat defects at structural level. This test goes lower the top or functional
layer to expose defects. Test case designing methods:
• Statement coverage
• Decision coverage
• Condition coverage
• Multiple Condition coverage
• Path coverage
8.3.2 Black box Testing:

This type of testing takes place by actual validating of requirement with actual result. In the black box
testing tester does not require knowing the internal logic of the project. He concern with the actual result
generated by the system. Functional testing is performed in black box testing. Here, the knowledge
about how the program internally executes or the programming language does not required. According
test plan, Following are the functionality which cover under the black box testing.
• Server connection
• Data upload

• features extraction
• Machine learning algorithm process
• classification
• Results
8.3.3 Unit testing:

Unit testing is small part of a system to check it might be as methods, functions, classes of code, interface
and of system. Therefore here, tester will test every small unit of the system to investigate whether the
module is suitable for the system. Software writes all units tests and carried out to verify that code
complete necessities, design and perform as per user requirement. Unit testing cover a few advantages
like those that error and bugs found at early stage. Because of the issues found at very early stage and
determined instantly is not disturbing the other part of codes.
8.3.4 Integration testing:

In the Integration testing, different modules has combine together and tested. To exchange information
easily between distinct modules of the system, test that it performance as per the given requirement.
When all testing allied work is completed, the software is deployed to the customer. Stress and load
testing is conducted in the integration testing. Here, how professionally and how many reviews have
been processed is been tested.
8.3.5 Validation testing:

In this testing, tester will verify the software that it covers all the requirements as per the system require-
ment specification. It makes sure that the requirement of software was at correct place. It also verifies
whether we have built right system or not. It checks the following: It justifies the execution and behavior
of the system. All probable input data given as input and capture projected output. Test log is used
for deployment 9.3.6 System Testing: After performing the integration testing, the next step is output
testing of the proposed system. No system could be useful if it does not produce the required output in a
specified format. The outputs generated are displayed by the user. Here the output format is considered
in PPT format document.
8.4 Test Strategy

This diagram is articulate trying out go with the flow. In step c it remember design of machine and in
keeping with that test Cycles, take a look at instances, entrance and go out standards, predicted results,
and so on is been accomplished. We need to recognize test instances and the desired records. The check
steps received from the business layout and the files related to transaction. The test methods require
layout of techniques together with status reporting, and making plans statistics tables. Step d. includes

soliciting for/building takes a look at database surroundings. Execute venture Integration check Run take
a look at cases from integration of the utility. Test all check cases after deployed on the system. Signoff
– this is ultimate stage whilst all end completed.
8.4.1 Testing Process:

There are extraordinary procedures for trying out the software. The diverse steps been described under-
neath:
• A demand of device is to be examined.
• The expected time of results after each testing module is been diagnosed.
• Testing which is been associated with equipment’s and reference record that are required to exe-
cute have to be listed.
• Deploy the setup for testing for check surroundings.
8.4.2 Functionality testing and non-functional testing:

Functionality Testing:
Capability checking out is accomplished to test the functionality of software program as in keeping with
layout specification of that software and is it operating as per requirement or no longer. In functionality
testing center capability of the utility tested with the aid of tester. Middle level functionalities like input
given, methods and setup on machine. It promotes test and affirm a specific approach or characteristic of
the program. Useful trying out may be very clean i.e. consumer can do it easily.
Project Aspect:
In the proposed system, all the function was tested by tester as well as developer. The EBD has major
functionality like user registration, file uploading and downloading, selecting expiry date of file. All these
are main functionality of the EBD are implemented successfully and working as per the user expectation.
Non-Functional Testing:
Non-useful trying out is been related to the best and functions of the module of software program. Non-
useful not concerning to a specific feature or person movement along with load control but it related to
software program features. Non-practical testing may be Reliability checking out, Usability trying out
etc.
8.5 Test Cases and Results

After implementation section while tester assessments code it detects the a few fault or disorder inside
the code. The faults corrected through a few method in short time. While testing the performed by means
of creating the test instances. There are person test cases performed for every state of affairs, and it tested
with the anticipated output by way of system or software. The following table indicates that everyone
the check cases which might be vital for project. Below Table shows the suite of test cases which are
executed and passed.

8.5.1 Test cases of system

During this project the system solution are investigating and presenting the new framework for address-
ing the problem of finding relevant result. The aim of this project was to improve the performance of
algorithm presented in base system. The results demonstrated in this project are showing the current
state of work done over practical implementation of this algorithm.

CHAPTER 9
Data Tables and Discussions
9.1 Result Analysis
9.1.1 First Register Page
Figure 9.1: First Register Page

9.1.2 First Login Page
Figure 9.2: Firs Login Page

9.1.3 First User File Upload Page
Figure 9.3: First User File Upload Page

9.1.4 First User Download File Page
Figure 9.4: First User Download File Page

9.1.5 Owner Select File Name Share Page
Figure 9.5: Owner Select File Name Share Page

9.1.6 Owner Select User Name Share Page
Figure 9.6: Owner Select File Name Share Page

9.1.7 Second User Register Page
Figure 9.7: Second User Register Page

9.1.8 Second User Login Page
Figure 9.8: Second User Login Page

9.1.9 File Request Page
Figure 9.9: File Request Page

9.1.10 File Request Accept Page
Figure 9.10: File Request Accept Page

9.1.11 File Access Page
Figure 9.11: File Access Page

9.1.12 File Enter Key Access Page
Figure 9.12: File Enter Key Access Page

CHAPTER 10
Conclusion And Future Work
10.1 Conclusion
We have studied various cryptographic techniques, encryption standards, de-duplication process and we
will apply it for developing organization specific independent cloud based secured data backup system
with multifactor authentication. To protect data confidentiality along with secure de-duplication, notion
of authorized de-duplication is proposed. To carry duplicate check firstly privileges assigned to user are
checked Instead of data itself duplicate check is based on differential privileges of users. Here, problem of
privacy preserving in de-duplication in cloud environment is considered and advanced scheme supporting
differential authorization and authorized duplicate check is proposed. This project addresses the issue
in authorized de-duplication to achieve better security. We showed that our authorized duplicate check
scheme incurs minimal overhead compared to convergent encryption and network transfer.
10.2 Future Work

We plan to further study and improve the data-restore performance of storage systems based on dedupli-
cation and delta compression in our future work.
REFERENCES
[1] Xue K, Xue Y, Hong J, Li W, Yue H, Wei DS, Hong P. RAAC: Robust and auditable access control
with multiple attribute authorities for public cloud storage. IEEE Transactions on Information Forensics
and Security. 2017 Apr;12(4):953-67.
[2] Kan Yang and Xiaohua Jia, Expressive, E-cient, and Revocable Data Access Control for Multi-
Authority Cloud Storage, IEEE Transactions on parallel and distributed systems, VOL. 25, NO. 07, July
2014.
[3] Zhongma Zhu and Rui Jiang proposed A Secure Anti-Collusion Data Sharing Scheme for Dynamic
Groups in the Cloud in IEEE TRANSACTIONS ON PAR- ALLEL AND DISTRIBUTED SYSTEMS,
VOL. 27, NO. 1, JANUARY 2016.
[4] N. Attarpadung, B. Libert, and E. Panaeu, Expressive keypolicy attribute based encryption with
constant-size ciphertexts, in 2011.
[5] F. Zhang and K. Kim. ID-Based Blind Signature and Ring Signature from Pairings. In ASIACRYPT
2002, volume 2501 of Lecture Notes in Computer Science, pages 533547. Springer, 2002.
[6] J. Han, Q. Xu, and G. Chen. E-cient id-based threshold ring signature scheme. In EUC (2), pages
437442. IEEE Computer Society, 2008.
[7] J. Yu, R. Hao, F. Kong, X. Cheng, J. Fan, and Y. Chen. Forward secure identity based signature:
Security notions and construction. Inf. Sci., 181(3):648660, 2011
[8] Yang K, Jia X. DAC-MACS: E-ective data access control for multi-authority cloud storage systems.
InSecurity for Cloud Storage Systems 2014 (pp. 59-83). Springer, New York, NY.
[9] Guangyan Zhang at. al. proposed CaCo: An Efficient Cauchy Coding Approach for Cloud Storage
Systems in IEEE Feb 2016.
[10] Ibrahim Adel Ibrahim at. al. proposed Intelligent Data Placement Mechanism for Replicas Distri-
bution in Cloud Storage Systems in 2016 IEEE International Conference on Smart Cloud

Final Report Deducation SPCOE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Report Deducation SPCOE

Uploaded by

Copyright:

Available Formats

A Dissertation Report

Secure data deduplication with Role Base Access Control in

Submitted to Savitribai Phule Pune University, Pune for partial fulfillment of

SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE

Under the Guidance of

Department of Computer Engineering,

Anita Dumbre Seat No:00000

Prof.Monika Rokade Prof.Rokade M.D

Prof.Gholap P.S Dr. G. U. Kharat

Department of Computer Engineering,

(University Certificate Page)

This is to Certify that

Student of M.E-II. Computer Engineering (Second Year) was Examined in

“Secure data deduplication with Role Base Access Control in cloud

On. . . . . . ./. . . . . . ./2021

3 Problem Definition and Scope 5

6 Software Requirement Specification 15

7 Detailed Design Documentation 23

9 Data Tables and Discussions 36

10 Conclusion And Future Work 48

6.1 System implementation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

9.1 First Register Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1 System Implementation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

6.1 System Implementation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.2 Problem Definition and Objective

1.2.1 Problem Definition

• To remove the duplicate copies of data and improve the reliability.

• To enhance data security and achieve data confidentiality

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 2

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 4

Problem Definition and Scope

3.1 Problem Definition

• To remove the duplicate copies of data and improve the reliability.

• To enhance data security and achieve data confidentiality

5.1 Area of Dissertation

5.2 Plan Of Dissertation Execution

5.2.1 Purpose of the document

Table 5.1: System Implementation Plan

Sr. Task Name Begin date End date Remarks

5.3 Proposed Methodology

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 8

Failure: Discard the query

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 9

5.4 Implementation Details

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 10

5.5.1 Algorithm 1: Hash Generation

5.5.2 Algorithm 2: Encryption and Decryption

5.5.3 Algorithms 3: Role Based Access Control Algorithms:

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 11

5.6 Feasibility Study

5.7 Technical Feasibility

5.8 Economical Feasibility

5.9 Operational Feasibility

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 12

5.10 Time Feasibility

5.11 Risk Management

5.11.1 Project Risks

ID Risk Description Probability Impact

5.12 Risk Mitigation, Monitoring and Management (RMMM)

Table 5.2: Risk Probability definitions

Probability Value Description

Department of Computer Engineering, SPCOE, Dumbarwadi, Otur 13

Table 5.3: Risk Impact definitions