Professional Documents
Culture Documents
1
Abstract:
Cloud computing has emerging as a promising pattern for data outsourcing and highqualitydata
services. Cloud computing provide the services for on demand users. On demand services
distribution perform with outsourcing operations. However, concerns of sensitive information on
cloud potentiallycause privacy problems. Data encryption protects data security to some extent,
butat the cost of compromised efficiency. Searchable symmetric encryption (SSE)
allowsretrieval of encrypted data over cloud. In this paper, we focus on addressing data
privacyissues using searchable symmetric encryption (SSE). For the first time, we formulate
theprivacy issue from the aspect of similarity relevance and scheme robustness. We observethat
server-side ranking based on order-preserving encryption (OPE) inevitably leaks dataprivacy. To
eliminate the leakage, we propose a two-round searchable encryption (TRSE)scheme that
supports top-k multi-keyword retrieval. In TRSE, we employ a vector spacemodel and
homomorphic encryption. The vector space model helps to provide sufficientsearch accuracy,
and the encryption enables users to involve in the rankingwhile the majority of computing work
is done on the server side by operations only oncipher text. As a result, information leakage can
be eliminated and data security is ensured.Thorough security and performance analysis show that
the proposed scheme guarantees high security and practical efficiency.
2
LIST OF CONTENTS Page No
1. Introduction 6
1.1 Purpose 6
1.2 Scope 6
1.3 Motivation 7
1.3.1 Definitions 7
1.3.2 Abbreviations 7
1.4 Overview 8
2. Literature Survey 9
2.1 Introduction
2.2 History
2.3 Purpose
2.4 Requirements
2.5 TechnologyUsed
3. System Analysis 19
3.1.1 Drawbacks 24
3
3.3 Proposed System 24
3.3.1 Advantages 25
3.5 Algorithm
4.1 Introduction 31
4.2 Purpose 32
5. System Design 35
5.1 Introducton 29
5.3 Scenarios 37
6. Implementation 70
6.1 Introduction 70
4
7. System Screens 80
8. SystemTesting 129
5
INTRODUCTION
Cloud computing provide the services for on demand users. On demand services
distribution perform with outsourcing operations. Before exchanges the data file must it encrypt
and store in cloud server. Those files are gets the privacy and distributes the resources with
effective solutions. Present traditional applications using keyword search find out the results with
encrypted files. It can show the result like huge amount of files. These search results are not
utilized efficiently in user’s side. In this project we are ready to solve all the problems and
introduce secure keyword search mechanism. Secure keyword display the top list of files with
fewer results in output. Using fewer results increases the file retrieval accuracy and reduces the
communication overhead. All list of files here only top and rank list. Secure Keyword Search
starts in ranked list files only. Ranked list files are identify with relevance score and statistical
measure. These kinds of results are shows as a good and strong security files generation process.
Here we are show the good performance comparison in between of previous system to present
system. We are proved proposed system is the best solution in implementation. That is called as
a ranked keyword search.
1.1 Purpose
Cloud computing provide the services for on demand users. On demand services distribution
perform with outsourcing operations. We focus on addressing data privacyissues using
searchable symmetric encryption. Those files are gets the privacy and distributes the resources
with effective solutions.
1.2 Scope
Those files are gets the privacy and distributes the resources with effective solutions. Present
traditional applications using keyword search find out the results with encrypted files. It can
show the result like huge amount of files.These search results are not utilized efficiently in user’s
side. In this project we are ready to solve all the problems and introduce secure keyword search
mechanism.
6
1.3 Motivation
Ranked files are gets the privacy and distributes the resources with effective solutions. Present
traditional applications using keyword search find out the results with encrypted files. It can
show the result like huge amount of files. These search results are not utilized efficiently in
user’s side. In this project we are ready to solve all the problems and introduce secure keyword
search mechanism.
1.3.1 Definition;
1. For the first time, we define the problem of secure ranked keyword search over encrypted
cloud data, and provide such an effective protocol, which fulfills the secure ranked search
functionality with little relevance score information leakage against keyword privacy.
2. Thorough security analysis shows that our ranked searchable symmetric encryption
scheme indeed enjoys “as-strong-as-possible” security guarantee compared to previous
SSE schemes.
3. We investigate the practical considerations and enhancements of our ranked search
mechanism, including the efficient support of relevance score dynamics, the
authentication of ranked search results, and the reversibility of our proposed one-to-many
order-preserving mapping technique.
4. Extensive experimental results demonstrate the effectiveness and efficiency of the
proposed solution.
1.3.2 Abbreviations:
Searchable symmetric encryption (SSE)
Order-preserving encryption (OPE)
a two-round searchable encryption (TRSE)
Information retrieval (IR)
Term frequency (tf)
Inverse Document Frequency (idf)
7
1.3.3 Module Diagram:
1.4 Overview
Section1. In this section we have discussion Introduction, Purpose, Scope and Motivation
Section2. In this section we will discussion Literature survey and Research Methodologies.
Section4. Inthis section we will discussion System Analysis, Module Description and Feasibility
Study.
Section6. In this section we will discussion System Design (UML, DFD, and ER Diagram).
8
Section10. In this section we will discussionReferences.
2. Literature Survey
Cloud computing has recently emerged as a new platform for deploying, managing, and
provisioning large-scale services through an Internet-based infrastructure. Successful examples
include Amazon EC2, Google App Engine, and Microsoft Azure . As a result, hosting databases
in the cloud has become a promising solution for Database-as-a-Service (DaaS) and Web 2.0
applications. In the cloud computing model, the data owner outsources both the data and
querying services to the cloud. The data are private assets of the data owner and should be
protected against bthe cloud and querying client; on the other hand, the query might disclose
sensitive information of the client and should be protected against the cloud and data owner.
Therefore, a vital concern in cloud computing is to protect both data privacy and query privacy
among the data owner, the client, and the cloud. The social networking service is one of the
sectors that witnesssuch rising concerns. For example, in Fig. 1 user Cindy wants to search an
online dating site for friends who share with her similar backgrounds (e.g., age, education, home
address). While the site or the data cloud should not disclose to Cindy personal details of any
user, especially those sensitive ones (e.g. home address), Cindy should not disclose the query
that involves her own details to the site or the cloud, either. More critical examples exist in
business sectors, where queries may reveal confidential business intelligence. For example, a
retail business plans to open a branch in a district. To calculate the target customer base, it needs
to query the demographic data of that district, which the data owner has outsourced to a data
cloud. While personal details in the Mutual Privacy Protection in Online Friend Matching
demographic data should not be disclosed to the outsourcing cloud or the business, the district
name in that query should not be disclosed to the cloud or data owner, either. It is also noted that
the cloud computing model worsens the consequence of privacy breaches in the above scenarios
as a single cloud may host querying services for many data owners. For example, two queries
from the same user, one on local clinic directory and another on anti-diabetic drugs, together
give a higher confidence that the user is probably suffering from diabetes. All the above concerns
call for a queryprocessing model that preserves both data privacy and query privacy among the
data owner, the client, and the cloud. The data owner should protect its data privacy, and does
not reveal any information beyond what the query result can imply. On the other hand, the client
9
should protect its query privacy so that the data owner and the cloud know nothing about the
query, and is therefore unable to infer any information about the client. Unfortunately, existing
privacy-preserving query processing solutions are not sufficient to solve this new problem
arising in the cloud model. Most research work in the literature addresses data privacy or query
privacy separately. For example, generalization techniques have been proposed to protect data
privacy by hiding quasi-identifier attributes and avoiding the disclosure of sensitive information
Similar techniques are proposed for query privacy on both relational data and spatial data. Only
very few, such as the Casper framework consider data and query privacy as a whole.
Furthermore, generalization-based solutions like the Casper still disclose the data or query in a
coarser and imprecise form. Not much research work addresses the unconditional privacy
required for this problem. Although some encryption schemes are proposed to protect the data
hosted on the outsourcing server they cannot be adopted in this problem for several reasons.
First, accurate query processing on encrypted data is difficult, if not impossible at all. Most
existing encryption schemes only support some specific queries. For example, space
transformation (e.g., space filling curve) used in [20] only supports approximation queries as it
cannot preserve the accurate distances in the original space. Second, even though suitable
encryptions are found for these queries, they become flawed when applied tour problem, as these
encryptions are not designed for mutual privacy protection in the first place. In particular, to
evaluate the query on the encrypted data, the client must encrypt the query by the same scheme
and send it to the outsourcing server, who may then forward it to the data owner,wherethe query
can be decrypted by her encryption parameters. Third, some encryptions or transformations are
shown to beVulnerable to certain security attacks. For example, distance preservingspace
transformations are vulnerable to principalComponentanalysis.In spite of the insufficiency of
these prior studies for our problem, they show us that a secure framework and an alternate
encryption scheme are both indispensable. In this paper, we propose a holistic and efficient
solution that is based on Privacy Homomorphism (PH). PHs are encryption transformations
which map a set of operations on clear text to another set of operations on cipher text. In essence,
PH enables complex computations (such as distances) based solely on ciphertext, without
decryption. We integrate a provably secure PH seamlessly with a generic index structure to
develop a novel query processing framework. It is efficient and can be applied to any multi-level
tree index. We address several challenges in this framework. First, an index consists of= multiple
10
nodes, and query processing on the index involves traversing these nodes. The cloud or data
owner should not be able to trace the access pattern and hence get any clue of the query. We
propose a client-lead processing paradigm that eliminates the disclosure of the query to any other
party. Second, to evaluate various types of complex queries, such ask NN and other distance-
based queries, a comprehensive setof client-cloud protocols must be devised to work together
with a PH that supports most arithmetic operations. Third, we prove the security and analyze the
complexity of the proposed algorithms and protocols. In particular, we present several
optimization techniques to improve the protocol efficiency and also show their privacy
implications. To summarize, our contributions in this paper are as follows:
• To the best of our knowledge, this is the first work that is dedicated to mutual privacy
protection for complex query processing over large-scale, indexed data in a cloud environment.
• We present a general index traversal framework that\ accommodates any multi-level index. The
framework can resist the index trace attempt of the cloud during query processing. Based on this
framework, we present a set of protocols to process typical distance-based queries.• We
thoroughly analyze the security and complexity of the proposed framework and protocols. In
particular, we present several optimization techniques to improve the protocol efficiency.
• An extensive set of experiments are conducted to evaluate the actual performance of our basic
and optimizedtechniques. The rest of the paper is organized as follows. Section II reviews
existing work on privacy-preserving query processing on outsourced data. Section III formulates
the problem and Section IV introduces ASM-PH, the privacy homomorphism used in this paper.
Section V overviews the secure processingframework, followed by detailed discussions on the
protocols in Sections VI and VII, with a focus on distance-based queries. Section VIII presents
three optimization techniques to improve the protocol efficiency. Section IX analyzes the
security and possible threats of our approach, followed by the performanceEvaluation in Section
X. Section XI concludes this paper with some future research directions. Our work falls into this
category but distinguishes itself from the others as being the first work that is dedicated to
mutual privacy protection. We propose a secure, encryption integrated framework that is suitable
for processing complexQueries over large-scale, indexed data. It is noteworthy that privacy-
preserving search on tree-structured data has been studied in some existing studies [6], [27], [2];
however, these works either consider one-way privacy or cannot provideunconditional privacy
11
guarantee. The third category considers a distributed environment where the data are partitioned
and outsourced to a set of independent and non-colluding outsourcing servers. The privacy-
preserving query processing requires a distributed and secure protocol to evaluate the result
without disclosing the data in each outsourcing server. The security foundation of such protocols
originates from secure multiparty computation (SMC), a cryptography problem that computes a
secure function from multiple participants in a distributed network .Privacy-preserving nearest
neighbor queries have been studied in this context for data mining. Shaneck et al. presented a
solution for point data on two parties. Qi and At allah improved this solution by applying a blind-
and-permute protocol, together with a secure selection and a multi-step kNNprotocol . For
approximate aggregation queries, Li et al. proposed randomized protocols based on probabilistic
computations to minimize the data disclosure [30]. Forvertically-partitioned data, privacy-
preserving top-k, kNNandjoin queries are studied . More recently, Gahnite et al. proposed a
private-information-retrieval (PIR) framework to evaluate ken queries in location-based
services . Thanks to oblivious transfer, a common primitive in SMC, the user can retrieve the
results without being pinpointed. However, solutions in the third category typically suffer from
heavy CPU, communication and storage overhead, as most SMC-based protocols do. As such,
they cannot scale well to large-scale databases. Introduce privacy homeomorphisms (PH), the
internal encryption scheme. PH are encryption transformations which map a set of operations on
clear text to another set of operations on cipher text. Formally, they are encryption functions by
applying the extended Euclidean algorithm. Obviously, this encryption is privacy
homomorphism under the operations defined by FŒ and F, because m = pq. However, this
encryption suffers from known-plaintext attacks [4],which means p and q could be found if a pair
of clear textandciphertext is known to an adversary. A. A Provably Secure Privacy
Homomorphism In , Domingo-Ferrer enhanced the above simple PH and proposed a provably
secure privacy homomorphism under thesame set of operations, i.e., modular addition,
subtractionand multiplication. We name it ASM-PH after its supported operations. It works as
follows. The public parameters are appositive integer t > 2 and a large integer m. t controls
howmany components a clear text is split into (t = 2 in the above). m should have many small
divisors (compared to t).Further, many integers smaller than m should be invertible
modulom..Similar to the simple PH, the set F of cipher textoperations are the corresponding
component wise operations in T .Finally, the encryption and decryption of this PH can be
12
described as follows..Encryption. Randomly split a clear text a ¸ ZmŒ intodegree are added up.
While ASM-PH can perform addition, subtraction and multiplication directly on the cipher texts,
these operations still cost considerable computations. Let + denote the costof a modular sum and
~ denote the cost of a modularmultiplication.1 Then the costs of these three operations are , and
respectively. It is alsonoteworthy that the multiplication will double the size of theciphertext
from t components to 2t components, with the firstcomponent being zero.As for the computation
cost of encryption, it is t~ as eachcomponent requires a modular multiplication with rj .2 Thecost
of decryption is similar, except that all components aresummed up in the end. As such, the cost is
It is noteworthy that the encryption will increases the sizeof the cleartext from 1 component to t
components. Sinceeach component is a positive integer in Zm, the size of theciphertext is thus t
E l(m), where l(m) denotes the number ofbits in m.In practice, the cost of modular addition is
dominated bythat of modular multiplication [9] and thus can be omittedwhen the latter presents.
Modular multiplication, especially forlarge modulus, also becomes extremely efficient (in the
magnitudeof 10.5 second) since the introduction of MontgomeryASM-PH is shown to be secure
against known-plain textattacks. Analytically, the size of the subset of keys thatare consistent
with n known cleartext- ciphertext pairs grows exponentially with s.n, where s. This means the
genuine key could be from an arbitrarily large key set. More security aspect of ASM-PH is
analyzed Distance Folding Both this and the next optimizations aim to reduce unnecessary
distance computations during the distance access fora single node. The key observation for the
distance foldingoptimization is from Eqn. 3, where the local distance is addedup from the
encrypted minimum square distance in eachdimension. Since distance is always positive, a
partial localdistance from a subset of all dimensions becomes a naturallower bound of the actual
local distance. This lower bound isparticularly useful because an R-tree node usually has tens
orhundreds of entries and some entries could be faraway fromthe query point, the complete local
distances of these entriesare not necessary and can be replaced by a lower bound whichserves the
same query processing purpose.We call this process“distance folding”. It is noteworthy that a
folded distance canalways be unfolded into a larger lower bound or even into theactual local
distance if necessary later on.The main challenge lies in when to stop adding up forthe partial
location distance — an immature stop leads toan aggressive lower bound that will probably be
unfoldedlater on. For distance range queries, the adding up can bestwhen the lower bound
reaches the distance threshold.ForkNN queries, however, the decision to fold a distance canrelate
13
to the processing status. Specifically, we keep only thetopmost L items in the priority queue as
“unfolded” whilethe distances of all rest items are folded as they are. Beforea folded distance is
to be inserted into the queue, it must beunfolded by at least one dimension or until it is no
longeramong the topmost L items. This strategy is called is “L-unfolded”, where L dictates how
aggressive the strategy is.Obviously queries; moreover, for the bestadaption to a specific dataset,
L can be runtime-adjustable byits performance as follows. When a query is complete, thesaving
can be calculated by counting all entries in the queuewhose distances are still folded, whereas the
overhead canbe calculated by counting the number of unfolding operationsduring processing. L
should be increased when the overheaddominates the saving, and vice versa.C. Entry
FoldingWhile distance can be folded by ignoring some dimensions,the same rationale can be
applied to the entries in an indexnode. The key observation is that, a node i usually has a
largenumber (typically over 100) of entries to fit into one diskpage, and it is unnecessary to
compute the local distance from to each entry. As such, some remote entries can be “folded”and
represented by a super entry. As there are fewer entriesin i, the computational cost of distance
access for i can besignificantly reduced. Fig. 5 illustrates the notion of super entry and entry
folding.The node i contains 10 objects or child entries which form twosuper entries a and b.
When i is accessed, it will be treatedas if there are only entries a and b. Later when entry a is
accessed (as it is closer to q than b), the same node i will beused and a will be unfolded into
entries 1-5. b, on the other hand, remains folded and entries 6-10 can be waived from the
distance access.The entry folding process is conducted offline at the data owner after the index is
built. For each node, the set of entriesare recursively partitioned into two subsets by
dimensionalaxes like the kd-tree index until each subset contains onlyone entry. Upon service
initialization, besides the shadow index, an auxiliary table is sent to the client that stores thesuper
entry information of all nodes (see Fig. 5). Each table record corresponds to a super entry and
has three fields: the node affiliation, minimum bounding box (encrypted by E), and the
associated entries. The table size depends on how we regulate the size of a super entry. Based on
its query demands, the client should initialize a desirable threshold W for the minimum number
of entries in each super entry. For example, W should be set higher than the largest k for k NN
queries. In Fig. 5, W = 5. Note that entry folding is not equivalent to reducing the fan out of the
index, as the latter is query independent. The tradeoff of entry folding lies in the wasted distance
computation of super entries that are unfolded later and the saved distance computation of folded
14
entries .In this paper, we study the problem of processing private queries on indexed data for
mutual privacy protection in a cloud environment .We present a secure index traversal
framework, based on which secure protocols are devised for classic types of queries. Through
theoretical proofs and performance evaluation, this approach is shown to be not only feasible, but
also efficient and robust under various parameter settings .We believe this work steps towards
practical applications of privacy homomorphism to secure query processing on large-scale,
structured datasets. As for future work, we plan to extend this work to other query types,
including top-k queries, skyline queries and multi-way joins. We also plan to investigate mutual
privacy protection for queries on senior unstructured datasets. In the current information era,
efficient and effective search capabilities for digital collections have become essential for
information management and knowledge discovery. Meanwhile, a growing number of
collections are professionally maintained in data centers and stored in encrypted form to limit
their access to only authorized users in order to protect\ confidentiality and privacy. Examples
include medical records, corporate proprietary communications, and sensitive government
documents. An emerging critical issue that Permission to make digital or hard copies of all or
part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
This section presents several representative scenarios where the secure search over a document
collection may take place. As shown in Fig. 1, the content owner, Olivia, uses the services of a
data center to store a large number of documents, as well as perform search and retrieval. Olivia
may also grant another user Alice the permission to search and retrieve her documents through
the data center. In this case, we refer to Olivia as the supervisor. In addition, to prevent leakage
of information against potential hackers’ break-in, the documents stored at the data center are
encrypted. The supervisor manages the content decryption keys and ma provide decryption
services upon Alice’s request. In the following, we examine a few application scenarios under
this framework. • Case 1: The content owner, Olivia, wants to search for some documents stored
at the data center. She has a limited bandwidth connection with the data center, and needs to
search through the encrypted content without downloading. The pre-processing is executed once
by Olivia, when she stores the documents, all in encrypted form, in the data center. The major
15
task of the pre-processing stage is to build a secure term frequency table and a secure inverse
document frequency table, so as to facilitate efficient and
Accurateinformation retrieval. For an unprotected term frequency table, both the search term and
its term frequency information are in plaintext. To
Protect the confidentiality of the search, we encrypt each of them in an appropriate way. As
shown in Fig. 2, a word w in a document first undergoes stemming to retain the word stem and to
remove the word ending. The stemmed word
S is then encrypted using an encryption function E and the word-key KwS - to obtain the
encrypted word w(e) Here the word-key is unique to each stemmed word and is obtained with a
key derivation function. w(e) S is further mapped to a particular row i in the term frequency
table, where the index i is established via a hashing function such that i = H(w(e)S ). The term
frequency information is collected by counting the number of occurrences of the stemmed word
in the jth document, and stored in the table entry This process is repeated to obtain the term
frequencies for all terms and documents, and the TF values are then further encrypted. In the
baseline model, the data center is only trusted with storing data. There is a single layer of
encryption To protect the term frequency information from both unauthorized users and from the
data center. We first encode each row of the term frequency table to minimize the required
storage. The encoded term frequency table denoted by is then encrypted to create, as TFwhere a
key i is used to encrypt the I th row of the term frequency table To increase security, the value of
I is unique for each row and is derived from the word-key KwS corresponding to the ith row.
Thus, compromising the key corresponding to one row does not compromising other rows of the
term frequency table. Since computing the relevance score requires the use of collection
frequency weight of a word as in the can be computed before-hand and encrypted using the same
word key as in the term frequency table.
Even though the optimizations from above keep evaluated cipher texts at the same length as
originalciphertexts, the size of these cipher texts is still very large – ˜θ(λ5) bits under our
suggested parameters.We next show how to “compress”, or post-process the cipher texts, down
to (asymptotically)the size of an RSA modulus, reducing the communication complexity of our
16
scheme dramatically.The price of this optimization, however, is that we cannot evaluate anything
on these compressedciphertexts. Hence we can only use this compression technique on the final
output cipher texts, after all applications of the Evaluate algorithm have been completed. (This
technique also introduces another hardness assumption, similar to the φ-hiding assumption of
Caching et al. [3].)Roughly, we supplement the public key with the description of a group G and
an element g ∈Whose order is a multiple of the secret key p. Then, given the cipher text c from
our scheme, the compressed cipher text is simply.Note that so decrypting is done
The baseline model introduced in the previous section addressesthe scenarios where the content
owner makes a queryhimself/herself. In this section, we present an alternatescheme to enable a
search capability from a user other thanthe content owner. This scheme reduces the involvement
ofOlivia by shifting the task of computing the relevance scoreto the data center, while still
maintaining the confidentialityof the term-frequency information and the document content.To
remove the need for communications between thedata center and content owner during content
search, wemust be able to perform computations and ranking directlyon term-frequency data in
its encrypted form. We refer tothis searchable layer of encryption as the inner-layer
encryption,which is denoted by TF(s). Inner-layer encryptioncan be done via cryptographic tools
such as homomorphicencryption (HME) and order preserving encryption (OPE);the computation
of relevance score should be adapted accordinglyto support encrypted domain computation.
Weuse OPE in this paper to demonstrate the concept for secureranking of relevance. After the
inner-layer encryption,TF(s) is encoded to obtain, and further encrypted toC in the same way as
in the baseline scheme. Werefer to this second round of encryption as outer-layer
encryption,which prevents unauthorized users from accessingTF information.The indexing and
pre-processing stages of the proposedschemes are similar to the baseline model with an
additionalinner-layer encryption. When searching for a particularquery consisting of multiple
terms, in the collection, Alice first performs stemming andsends the stemmed words to the
content owner, Olivia, whochecks whether Alice has the required permission to searchfor the
query words. Upon verification, Olivia derives theword-keys from the master key and uses it to
encrypt thestemmed-words to obtain Thehash value of is then calculated and transmittedto Alice
who forwards it to the data center. Using the receivedhash values the data center searchesthe
protected term frequency table C and identifies therows corresponding to the query words,
without obtainingplaintext information about the query
17
In this section, we compare the performance of the baselinemodel and OPE in terms of security,
retrieval accuracy,and examine the tradeoffs involved in securing the term frequencyusing order
preserving encryption. We evaluate theretrieval accuracies of the secure search schemes on
theW3Ccollection, with the 59 queries used for the discussion searchin the enterprise track in the
2005 Text Retrieval Conference. Any document that is judged partiallyrelevant or relevant is
taken to be relevant in our test (i.e.conflating the top two judgementlevels. We study the In this
work, we develop a framework for confidentialitypreservingrank-ordered search in large scale
document collections.We explore techniques to securely rank-order thedocuments and extract the
most relevant document(s) froman encrypted collection based on the encrypted search
queries.We present several representative scenarios depending on thesecurity requirement; and
develop techniques to perform efficientsearch and retrieval in each case. The proposed
methodmaintains the confidentiality of the query as well as the contentof retrieved
documents.The techniques introduced in this work are first attemptsto bring together advanced
information retrieval capabilitiesand secure search capabilities. In addition to our focus
onsecuring indices, other important security issues include protectingcommunication links and
combating traffic analysis.These will need to be addressed in future work. Furtherinvestigations
of complete cryptographic modeling, efficientalgorithm design, and system evaluations can shine
light on
an improved balance between the security, efficiency, andaccuracy of search, leading to a wide
range of applications,such as searching information with hierarchical access control,and flexible
“e-discovery” practices for digital recordsin legal proceedings.
18
3. SYSTEM ANALYSIS
The Systems Development Life Cycle (SDLC), or Software Development Life Cycle in
systems engineering, information systems and software engineering, is the process of creating or
altering systems, and the models and methodologies that people use to develop these systems.
In software engineering the SDLC concept underpins many kinds of software development
methodologies. These methodologies form the framework for planning and controlling the
creation of an information system the software development process.
19
WHAT IS SDLC?
A software cycle deals with various parts and phases from planning to testing
and deploying software. All these activities are carried out in different ways, as per the needs.
Each way is known as a Software Development Lifecycle Model (SDLC).A software life cycle
model is either a descriptive or prescriptive characterization of how software is or should be
developed. A descriptive model describes the history of how a particular software system was
developed. Descriptive models may be used as the basis for understanding and improving
software development processes or for building empirically grounded prescriptive models.
SDLC models * The Linear model (Waterfall) - Separate and distinct phases of specification
and development. - All activities in linear fashion. - Next phase starts only when first one is
complete. * Evolutionary development - Specification and development are interleaved (Spiral,
incremental, prototype based, Rapid Application development). - Incremental Model (Waterfall
in iteration), - RAD(Rapid Application Development) - Focus is on developing quality product in
less time, - Spiral Model - We start from smaller module and keeps on building it like a spiral. It
is also called Component based development. * Formal systems development - A mathematical
system model is formally transformed to an implementation. * Agile Methods. - Inducing
flexibility into development. * Reuse-based development - The system is assembled from
existing components.
The General Model
Software life cycle models describe phases of the software cycle and the order in which those
phases are executed. There are tons of models, and many companies adopt their own, but all
have very similar patterns. Each phase produces deliverables required by the next phase in the
life cycle. Requirements are translated into design. Code is produced during implementation that
is driven by the design. Testing verifies the deliverable of the implementation phase against
requirements.
SDLC Methodology:
Spiral Model
The spiral model is similar to the incremental model, with more emphases placed on risk
analysis. The spiral model has four phases: Planning, Risk Analysis, Engineering and
20
Evaluation. A\ software project repeatedly passes through these phases in iterations (called
Spirals in this model). The baseline spiral, starting in the planning phase, requirements is
gathered and risk is assessed. Each subsequent spirals builds on the baseline spiral.
Requirements are gathered during the planning phase. In the risk analysis phase, a process is
undertaken to identify risk and alternate solutions. A prototype is produced at the end of the
risk analysis phase. Software is produced in the engineering phase, along with testing at
the end of the phase. The evaluation phase allows the customer to evaluate the output of the
project to date before the project continues to the next spiral. In the spiral model, the angular
component represents progress, and the radius of the spiral represents cost. Spiral Life Cycle
Model.
This document play a vital role in the development of life cycle (SDLC) as it describes the
complete requirement of the system. It means for use by developers and will be the basic during
testing phase. Any changes made to the requirements in the future will have to go through
formal change approval process.
SPIRAL MODEL was defined by Barry Boehm in his 1988 article, “A spiral Model of
Software Development and Enhancement. This model was not the first model to discuss
iterative development, but it was the first model to explain why the iteration models.
As originally envisioned, the iterations were typically 6 months to 2 years long. Each phase
starts with a design goal and ends with a client reviewing the progress thus far. Analysis and
engineering efforts are applied at each phase of the project, with an eye toward the end goal of
the project.
The new system requirements are defined in as much details as possible. This usually
involves interviewing a number of users representing all the external or internal users
and other aspects of the existing system.
21
A first prototype of the new system is constructed from the preliminary design. This
is usually a scaled-down system, and represents an approximation of the
characteristics of the final product.
1. Evaluating the first prototype in terms of its strengths, weakness, and risks.
At the customer option, the entire project can be aborted if the risk is deemed too
great. Risk factors might involved development cost overruns, operating-cost
miscalculation, or any other factor that could, in the customer’s judgment, result in a
less-than-satisfactory final product.
The existing prototype is evaluated in the same manner as was the previous prototype,
and if necessary, another prototype is developed from it according to the fourfold
procedure outlined above.
The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.
The final system is thoroughly evaluated and tested. Routine maintenance is carried
on a continuing basis to prevent large scale failures and to minimize down time.
22
Fig -Spiral Model
Advantages
23
3.1 Existing System:
Now-a-days cloud servers are gets the high storage files. Here select and processing the files gets
the burden problems. Whenever large numbers of files are available in cloud server under
encryption some problems are generated. Total files are not encrypted. That’s here there is no
sufficient privacy and security in outsourcing. Some unauthorized users are entering and corrupt
the content of information.
Previously user are selects the files in interesting manner as a plain text files. This is
failing under access the files. There is no perfect decryption technique under access the files of
representation process. User are suffers with present searching technique.
Drawbacks
1. For each search request, users without pre-knowledge of the encrypted cloud data
have to go through every retrieved file in order to find ones most matching their
interest, which demands possibly large amount of postprocessing over head.
2. Invariably sending back all files solely based on presence/absence of the keyword
further incurs large unnecessary network traffic, which is absolutely undesirable in
today’s pay-as-you-use cloud paradigm.
3. It is ineffective under searching
We introduce the concepts of similarity relevance and scheme robustness toformulate the privacy
issue in searchable encryption schemes, and then solve the insecurityproblem by proposing a
two-round searchable encryption (TRSE) scheme. Novel technologiesin the cryptography
community and information retrieval community are employed, includinghomomorphism
encryption and vector space model. In the proposed scheme, the majority ofcomputing work is
done on the cloud while the user takes part in ranking, which guarantees top-kmulti-keyword
retrieval over encrypted cloud data with high security and practical efficiency
24
Main Contribution:
We propose the concepts of similarity relevance and scheme robustness. We thus performthe
first attempt to formulate the privacy issue in searchable encryption, and we show
serversideranking based on order-preserving encryption (OPE) inevitably violates data privacy.
We propose a two-round searchable encryption (TRSE) scheme, which fulfills the securemulti-
keyword top-k retrieval over encrypted cloud data. Specifically, for the first time weemploy
relevance score to support multi-keyword top-k retrieval.
Thorough analysis on security demonstrates the proposed scheme guarantees high data privacy.
Furthermore, performance analysis and experimental results show that our scheme isefficient for
practical utilization.
3.3.Advantages
Modules Description:
4. κ-similarity Relevance:
25
1 Entities in the cloud computing system:
Create three different entities. Those entities are data owner, user and cloud server. Data
owner starts the collection of files. All encrypted files of information store in cloud server.
User enters the secure searchable keyword. Automatically extract the related files, calculate
the index value. All results are displayed as a authentication files of content
Some of the multi-keyword searchable symmetric encryption schemes support only Boolean
Queries, i.e., a file either match or do not match a query. Considering the large numberof data
users and documents in the cloud, it is necessary to allow multi-keyword in the searchquery and
return documents in the order of their relevancy with the queried keywords.Scoring is a natural
way to weight the relevance.
The weight of a single keyword on a file, we employ vector space model toscore a file on multi-
keyword. The vector space model is an algebraic model for representinga file as a vector. Each
dimension of the vector corresponds to a separate term, i.e., if a termoccurs in the file, its value
in the vector is non-zero, otherwise is zero. The vector spacemodel supports multi-term and non-
binary presentations Wedenote the possible information leakage with statistic leakage. There are
two possible statisticleakages, including term distribution and inter distribution. The term
distribution of term tis ts frequency distribution of scores on each file.
4.κ-similarity Relevance:
In order to avoid information leakage in server-side ranking schemes, a series of techniques have
been employed to flatten or transfer the distribution of relevance scores. These
approaches,however, only cover the distribution of individual term or file, ignoring the relevance
betweenthem and the violation of data privac
26
3.5 FEASIBILITY STUDY
Preliminary investigation examine project feasibility, the likelihood the system will
be useful to the organization. The main objective of the feasibility study is to test the Technical,
Operational and Economical feasibility for adding new modules and debugging old running
system. All system is feasible if they are unlimited resources and infinite time. There are aspects
in the feasibility study portion of the preliminary investigation:
Technical Feasibility
Operational Feasibility
Economical Feasibility
4.5.1 ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs.
The system is economically feasible. It does not require any addition hardware or
software. Since the interface for this system is developed using the existing resources and
technologies available at NIC, There is nominal expenditure and economical feasibility for
certain.
Proposed projects are beneficial only if they can be turned out into information system.
That will meet the organization’s operating requirements. Operational feasibility aspects of the
project are to be taken as an important part of the project implementation. Some of the important
issues raised are to test the operational feasibility of a project includes the following: -
The well-planned design would ensure the optimal utilization of the computer resources and
would help in the improvement of performance status.
The technical issue usually raised during the feasibility stage of the investigation includes
the following:
This paper introduces a new framework for confidentialitypreserving rank-ordered search and
retrieval over large documentcollections. The proposed framework not only
protectsdocument/query confidentiality against an outside intruder,but also prevents an untrusted
28
data center from learninginformation about the query and the document collection.We present
practical techniques for proper integration of relevancescoring methods and cryptographic
techniques, suchas order preserving encryption, to protect data collectionsand indices and
provide efficient and accurate search capabilitiesto securely rank-order documents in response to
aquery. Experimental results on the W3C collection showthat these techniques have comparable
performance to conventionalsearch systems designed for non-encrypted data interms of search
accuracy. The proposed methods thus formthe first steps to bring together advanced information
retrievaland secure search capabilities for a wide range of applicationsincluding managing data
in government and businessoperations, enabling scholarly study of sensitive data,and facilitating
the document discovery process in litigation.established cryptographic primitives. The
understandingsobtained from this exploration will pave ways to bring togetherresearchers from
information retrieval [1] and appliedcryptography to establish a bridge between these areas.To
accomplish our goals, we collect term frequency informationfor each document in the collection
to build indices,as in traditional retrieval systems for plaintext. We furthersecure these indices
that would otherwise reveal importantstatistical information about the collection to protectagainst
statistical attacks. During the search process, thequery terms are encrypted to prevent the
exposure of informationto the data center and other intruders, and toconfine the searching entity
to only make queries within anauthorized scope. Utilizing term frequencies and other
documentinformation, we apply cryptographic techniques suchas order-preserving encryption to
develop schemes that cansecurely compute relevance scores for each document, identifythe most
relevant documents, and reserve the right toscreen and release the full content of relevant
documents.The proposed framework has comparable performance toconventional searching
systems designed for non-encrypteddata in terms of search accuracy.The rest of this paper is
organized as follows. Relatedbackground and prior work are reviewed in .
There has been a considerable amount of prior work onalgorithms and data structures to support
information retrievalfor plaintext documents focussing on various issues,including efficient
representation [1] and effective ranking [3].In contrast, protection of sensitive information in the
documentcollection, the indices, and/or the queries has receivedmuch less attention until
recently. Some explorationof search in encrypted data and private information retrievalsystems
has been reported in . These techniques generallyinvolve high computational complexity in
search, orincur a considerable increase in storage to store specially encrypteddocuments.
29
Approaches to reduce search complexitywere introduced in at an expense of limited
searchcapabilities confined by a keyword list identified beforehand.The documents containing
some of the pre-identified keywordsare first found, and the keywords or the keywordindices are
encrypted in a way that facilitates search andretrieval. These existing techniques target simple
Booleansearches to identify the presence or absence of a term in anencrypted text. Much of the
existing work has not been appliedto large collections, and it is not clear whether it canbe easily
extended to more sophisticated relevance-rankedsearches.This section presents several
representative scenarios wherethe secure search over a document collection may take place.
As shown in Fig. 1, the content owner, Olivia, uses the servicesof a data center to store a large
number of documents,as well as perform search and retrieval. Olivia may alsogrant another user
Alice the permission to search and retrieveher documents through the data center. In this case,
we refer to Olivia as the supervisor. In addition, to prevent leakage of information against
potential hackers’ break-in,the documents stored at the data center are encrypted. Thesupervisor
manages the content decryption keys and mayprovide decryption services upon Alice’s request.
In the following,we examine a few application scenarios under thisframework.
30
4 System Requirements Specification
5.1 Introduction
A Software Requirements Specification (SRS) – a requirements specification for
a software system – is a complete description of the 31nalysing of a system to be developed. It
includes a set of use cases that describe all the interactions the users will have with the software.
In addition to use cases, the SRS also contains non-functional requirements. Non-functional
requirements are requirements which impose constraints on the design or implementation (such
as performance engineering requirements, quality standards, or design constraints).
System requirements specification: A structured collection of information that embodies the
requirements of a system. A business analyst, sometimes titled system analyst, is responsible for
31nalysing the business needs of their clients and stakeholders to help identify business problems
and propose solutions. Within the systems development life cycle domain, typically performs a
liaison function between the business side of an enterprise and the information technology
department or external service providers. Projects are subject to three sorts of requirements:
Business requirements describe in business terms what must be delivered or
accomplished to provide value.
Product requirements describe properties of a system or product (which could be one of
several ways to accomplish a set of business requirements.)
Process requirements describe activities performed by the developing organization. For
instance, process requirements could specify specific methodologies that must be
followed, and constraints that the organization must obey.
Product and process requirements are closely linked. Process requirements often specify the
activities that will be performed to satisfy a product requirement. For example, a maximum
development cost requirement (a process requirement) may be imposed to help achieve a
maximum sales price requirement (a product requirement); a requirement that the product be
maintainable (a Product requirement) often is addressed by imposing requirements to follow
particular development styles
31
5.2 PURPOSE
An systems engineering, a requirement can be a description of what a system must do, referred
to as a Functional Requirement. This type of requirement specifies something that the delivered
system must be able to do. Another type of requirement specifies something about the system
itself, and how well it performs its functions. Such requirements are often called Non-functional
requirements, or ‘performance requirements’ or ‘quality of service requirements.’ Examples of
such requirements include usability, availability, reliability, supportability, testability and
maintainability.
In software engineering, the same meanings of requirements apply, except that the focus of
interest is the software itself.
In Functional requirements owner has its own functionalities likecloud server and user.Actually
owner sends encrypted files to cloud server and monitor the information in the side of cloud
server by the data owner. User retrieved datafrom cloud server with security guarantee.
32
*Data Owner Functional Requirements:
*CloudServerfunctional requirements:
5. k-Similarity Relavance means identify the sequences and calculate the largest common
subsequences and draw the graph.
6.Change password
7. Logout
33
4. Changepassword is mandatory to provide security
5.1 Introduction
The purpose of the design phase is to plan a solution of the
problem specified by the requirement document. This phase is the first step in moving from the
problem domain to the solution domain. In other words, starting with what is needed, design
takes us toward how to satisfy the needs. The design of a system is perhaps the most critical
factor affection the quality of the software; it has a major impact on the later phase, particularly
testing, maintenance. The output of this phase is the design document. This document is similar
to a blueprint for the solution and is used later during implementation, testing and maintenance.
The design activity is often divided into two separate phases System Design and Detailed
Design.
System Design also called top-level design aims to identify the modules that should be in the
system, the specifications of these modules, and how they interact with each other to produce the
desired results. At the end of the system design all the major data structures, file formats, output
formats, and the major modules in the system and their specifications are decided.
During, Detailed Design, the internal logic of each of the modules specified in system
design is decided. During this phase, the details of the data of a module is usually specified in a
high-level design description language, which is independent of the target language in which the
software will eventually be implemented.
In system design the focus is on identifying the modules, where as during detailed design
the focus is on designing the logic for each of the modules. In other works, in system design the
attention is on what components are needed, while in detailed design how the components can be
implemented in software is the issue.
Design is concerned with identifying software components specifying relationships
among components. Specifying software structure and providing blue print for the document
phase. Modularity is one of the desirable properties of large systems. It implies that the system is
divided into several parts. In such a manner , the interaction between parts is minimal clearly
specified.
35
During the system design activities , Developers bridge the gap between the requirements
specification , produced during requirements elicitation and analysis , and the system that is
delivered to the user.
Design is the place where the quality is fostered in development . Software design is a
process through which requirements are translated into a representation of software.
Object model in UML is represented with class diagrams , describing the structure of the system
in terms of objects , attributes , associations and operations.
Dynamic model in UML is represented with sequence diagrams , start chart diagrams and
activity diagrams describing the internal behaviour of the system.
36
5.3 Scenarios
A Use Case is an abstraction that all describes all possible scenarios involving the described
functionality . A scenario is an instance of a use case describing a concrete set of actions.
The name of the scenario enables us to refer it ambiguously. The name of
scenario is underlined to indicate it is an instance.
The Participating actor instance field indicates which actor instance are
involved in this scenario. Actor instance also have underlined names.
The Flow of Events of scenario describe the sequence of events step by step.
Actors
Actors represent external entities that interact with the system. An actor can be human or
external system.
Actor are not part of the system. They represent anyone or anything that interact with the system.
An Actor may
Only input information to the system.
Only receive information from the system.
Input and receive information from to and from the system.
During this activity , developers indentify the actors involved in this system are:
User:
37
User is an actor who uses the system and who performs the operations like data classifications
and execution performance that are required for him.
Use Cases:
Use case diagrams model the functionality of a system using actors and use cases. Use cases are
services or functions provided by the system to its users.
System
Draw your system's boundaries using a rectangle that contains use cases. Place actors outside the
system's boundaries.
Use Case
Draw use cases using ovals. Label with ovals with verbs that represent the system's functions.
38
Actors
Actors are the users of a system. When one system is the actor of another system, label the actor
system with the actor stereotype.
Relationships
Illustrate relationships between an actor and a use case with a simple line. For relationships
among use cases, use arrows labeled either "uses" or "extends." A "uses" relationship indicates
that one use case is needed by another in order to perform a task. An "extends" relationship
indicates alternative options under a certain use case.
39
Dataowner:
login
UploadFile
Indexwords
Dataowner
KeywordIndex
Security ChangePassword
RegisterUsers
Logout
40
Cloudserver:
login
OutsourceofFiles
CloudFilesInformation
IndexofFiles(i)
VectorSpaceModel
TermFrequency
termdistribution
S-Leakage or SearchPattern
CloudServer Identificationofattackers
Sequences
K-Similarity Relavance
LargestCommonSequences
OpenGraph
Security
ChangePassword
Logout
41
User:
login
Search
User Profile
ChangePassword
logout
42
5.3.2 Object model
Class Diagram
Class diagrams are the backbone of almost every object-oriented method including UML. They
describe the static structure of a system.
Illustrate classes with rectangles divided into compartments. Place the name of the class in the
first partition (centered, bolded, and capitalized), list the attributes in the second partition, and
write operations into the third.
Active Class
Active classes initiate and control the flow of activity, while passive classes store data and serve
other classes. Illustrate active classes with a thicker border.
Visibility
43
Use visibility markers to signify who can access the information contained within a class. Private
visibility hides information from anything outside the class partition. Public visibility allows all
other classes to view the marked information. Protected visibility allows child classes to access
information they inherited from a parent class. .
Associations
Associations represent static relationships between classes. Place association names above, on, or
below the association line. Use a filled arrow to indicate the direction of the relationship. Place
roles near the end of an association. Roles represent the way the two classes see each other.
Note: It's uncommon to name both the association and the class roles.
Multiplicity (Cardinality)
Place multiplicity notations near the ends of an association. These symbols indicate the number
of instances of one class linked to one instance of the other class. For example, one company will
have one or more employees, but each employee works for one company only.
44
Constraint
Simple Constraint
Composition is a special type of aggregation that denotes a strong ownership between Class A,
the whole, and Class B, its part. Illustrate composition with a filled diamond. Use a hollow
diamond to represent a simple aggregation relationship, in which the "whole" class plays a more
important role than the "part" class, but the two classes are not dependent on each other. The
diamond end in both a composition and aggregation relationship points toward the "whole" class
or the aggregate
45
Generalization
Generalization is another name for inheritance or an "is a" relationship. It refers to a relationship
between two classes where one class is a specialized version of another. For example, Honda is a
type of car. So the class Honda would have a generalization relationship with the class car.
In real life coding examples, the difference between inheritance and aggregation can be
confusing. If you have an aggregation relationship, the aggregate (the whole) can access only the
PUBLIC functions of the part class. On the other hand, inheritance allows the inheriting class to
access both the PUBLIC and PROTECTED functions of the superclass.
46
LoginAction
UserRegistration UploadFile
EnterLoginid
EnterPersonalInfo File : file
Enter Password
Registration is successful() File uploaded successfully()
LoginisSuccessful()
CountAction ViewCloudInformation
StasticsAction
int : countid File : file
int : SingleKeywordcount
String : filename int : fileid
Weight of the Single Keyword()
Weight of the files() ViewInformation()
SearchAction
ViewResults
String : keyword
File : files
Enter the Keyword
View Searched Files()
Keyword Entered Successfully()
Sequence Diagram
Class roles
Class roles describe the way an object will behave in context. Use the UML object symbol to
illustrate class roles, but don't list object attributes.
47
Activation
Messages
Messages are arrows that represent communication between objects. Use half-arrowed lines to
represent asynchronous messages. Asynchronous messages are sent from an object that will not
wait for a response from the receiver before continuing its tasks.
48
Various message types for Sequence and Collaboration diagrams
Lifelines
Lifelines are vertical dashed lines that indicate the object's presence over time.
Destroying Objects
Objects can be terminated early using an arrow labeled "<< destroy >>" that points to an X.
49
Loops
A repetition or loop within a sequence diagram is depicted as a rectangle. Place the condition for
exiting the loop at the bottom left corner in square brackets [ ].
50
login Search Profile changepasswor logout
d
Search
Loginfail
ViewProfile
Changepassword
Logout
51
State chart Diagram
A statechart diagram shows the behavior of classes in response to external stimuli. This diagram
models the dynamic flow of control from state to state within a system.
States
States represent situations during the life of an object. You can easily illustrate a state in
SmartDraw by using a rectangle with rounded corners.
Transition
A solid arrow represents the path between different states of an object. Label the transition with
the event that triggered it and the action that results from it.
Initial State
52
Final State
An arrow pointing to a filled circle nested inside another circle represents the object's final state.
A short heavy bar with two transitions entering it represents a synchronization of control. A short
heavy bar with two transitions leaving it represents a splitting of control that creates multiple
states.
53
user
registration
login
Searchingthecloudfiles
Profile
Changepassword
logout
54
Activity Diagram
An activity diagram illustrates the dynamic nature of a system by modeling the flow of control
from activity to activity. An activity represents an operation on some class in the system that
results in a change in the state of the system. Typically, activity diagrams are used to model
workflow or business processes and internal operation. Because an activity diagram is a special
kind of statechart diagram, it uses some of the same modeling conventions.
Action states
Action states represent the noninterruptible actions of objects. You can draw an action state in
SmartDraw using a rectangle with rounded corners.
Action Flow
55
Object Flow
Object flow refers to the creation and modification of objects by activities. An object flow arrow
from an action to an object means that the action creates or influences the object. An object flow
arrow from an object to an action indicates that the action state uses the object.
Initial State
Final State
An arrow pointing to a filled circle nested inside another circle represents the final action state.
Branching
A diamond represents a decision with alternate paths. The outgoing alternates should be labeled
with a condition or guard expression. You can also label one of the paths "else."
56
Synchronization
A synchronization bar helps illustrate parallel transitions. Synchronization is also called forking
and joining.
Swimlanes
57
User
login
loginfail
loginsuccess
UserHome
Registrasion Profile
Searching Security
A graphical tool used to describe and analyze the moment of data through a system manual or
automated including the process, stores of data, and delays in the system. Data Flow Diagrams
58
are the central tool and the basis from which other components are developed. The
transformation of data from input to output, through processes, may be described logically and
independently of the physical components associated with the system. The DFD is also know as
DFDs are the model of the proposed system. They clearly should show the requirements on
which the new system should be built. Later during design activity this is taken as the basis for
drawing the system’s structure charts. The Basic Notation used to create a DFD’s are as follows:
2. Process: People, procedures, or devices that use or produce (Transform) Data. The physical
4. Data Store: Here data are stored or referenced by a process in the System.
59
CONTEXT LEVEL DIAGRAM
AUTHENTICATION DFD:
60
DataOwner DFD:
ed
61
CloudServer:
62
63
1.5 Data Dictionaries and ER Diagram:
ER Diagram:
ER Diagram:
Data Dictionary:
Addresses:
65
User details:
Count:
Data1
DownloadUserdetails:
66
File Data:
Logindetails:
Mainindex:
Phones:
SearchData:
67
StasaticsLeakage:
68
6. Implementation
6.1 Introduction
Implementation is the stage where the theoretical design is turned in to working system.
The most crucial stage is achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.
The system can be implemented only after through testing is done and if it found to work
according to the specification. It involves careful planning, investigation of the current system
and its constraints on implementation, design of methods to achieve the change over and an
evaluation of change over methods a part from planning. Two major tasks of preparing the
implementation are education and training of the users and testing of the system.
The more complex the system being implemented, the more involved will be the systems
analysis and design effort required just for implementation. The implementation phase comprises
of several activities. The required hardware and software acquisition is carried out. The System
may require some hardware and software acquisition is carried out. The system may require
some software to be developed. For this, programs are written and tested. The user then changes
over to his new fully tested system and the old system is discontinued.
Implementation is the process of having systems personnel check out and put new
equipment in to use, train users, install the new application, and construct any files of data
needed to it.
Depending on the size of the organization that will be involved in using the application
and the risk associated with its use, system developers may choose to test the operation in only
one area of the firm, say in one department or with only one or two persons. Sometimes they
will run the old and new systems together to compare the results. In still other situations,
developers will stop using the old system one-day and begin using the new one the next. As we
will see, each implementation strategy has its merits, depending on the business situation in
which it is considered. Regardless of the implementation strategy used, developers strive to
ensure that the system’s initial use in trouble-free.
69
Once installed, applications are often used for many years. However, both the
organization and the users will change, and the environment will be different over the weeks and
months. Therefore, the application will undoubtedly have to be maintained. Modifications and
changes will be made to the software, files, or procedures to meet the emerging requirements.
The Java platform consists of the Java application programming interfaces (APIs)
and the Java virtual machine (JVM).
The following Java technology lets developers, designers, and business partners develop and
deliver a consistent user experience, with one environment for applications on mobile and
embedded devices. Java meshes the power of a rich stack with the ability to deliver customized
experiences across such devices.
Java APIs are libraries of compiled code that you can use in your programs. They let you add
ready-made and customizable functionality to save you programming time.
Java programs are run (or interpreted) by another program called the Java Virtual Machine.
Rather than running directly on the native operating system, the program is interpreted by the
Java VM for the native operating system. This means that any computer system with the Java
VM installed can run Java programs regardless of the computer system on which the applications
were originally developed.
70
In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris TM Operating System (Solaris OS),
Linux, or Mac OS.
The Java programming language is a high-level language that can be characterized by all of the
following buzzwords:
Multithreaded Robust
Dynamic Secure
Each of the preceding buzzwords is explained in The Java Language Environment , a white
paper written by James Gosling and Henry McGilton.
In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
71
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine 1 (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris™ Operating System (Solaris OS), Linux,
or Mac OS. Some virtual machines, such as the Java HotSpot virtual machine, perform
additional steps at runtime to give your application a performance boost. This include various
tasks such as finding performance bottlenecks and recompiling (to native code) frequently used
sections of code
72
Through the Java VM, the same application is capable of running on multiple platforms.
73
1. Read the explicit data sent by the client.
The end user normally enters this data in an HTML form on a Web page. However, the data
could also come from an applet or a custom HTTP client program. Chapter 4 discusses how
servlets read this data.
2. Read the implicit HTTP request data sent by the browser.
Figure 1–1 shows a single arrow going from the client to the Web server (the layer where
servlets and JSP execute), but there are really two varieties of data: the explicit data that the end
user enters in a form and the behind-the-scenes HTTP information. Both varieties are critical.
The HTTP information includes cookies, information about media types and compression
schemes the browser understands,
74
The Advantages of Servlets Over “Traditional” CGI
Java servlets are more efficient, easier to use, more powerful, more portable, safer, and cheaper
than traditional CGI and many alternative CGI-like technologies. With traditional CGI, a new
process is started for each HTTP request. If the CGIprogram itself is relatively short, the
overhead of starting the process can dominatethe execution time. With servlets, the Java virtual
machine stays running and handleseach request with a lightweight Java thread, not a
heavyweight operating system process.Similarly, in traditional CGI, if there are N requests to the
same CGI program,the code for the CGI program is loaded into memory N times. With servlets,
however,there would be N threads, but only a single copy of the servlet class would be
loaded. This approach reduces server memory requirements and saves time by instantiating
fewer objects. Finally, when a CGI program finishes handling a request, the program terminates.
This approach makes it difficult to cache computations, keep database connections open, and
perform other optimizations that rely on persistent data. Servlets, however, remain in memory
even after they complete a response, so it is straightforward to store arbitrarily complex data
between client requests.
Convenient
Servlets have an extensive infrastructure for automatically parsing and decoding HTML
form data, reading and setting HTTP headers, handling cookies, tracking sessions, and many
other such high-level utilities. In CGI, you have to do much of this yourself. Besides, if you
already know the Java programming language, why learn Perl too? You’re already convinced
that Java technology makes for more reliable and reusable code than does Visual Basic,
VBScript, or C++. Why go back to those languages for server-side programming?
Powerful
Servlets support several capabilities that are difficult or impossible to accomplish with
regular CGI. Servlets can talk directly to the Web server, whereas regular CGI programs cannot,
at least not without using a server-specific API. Communicating with the Web server makes it
easier to translate relative URLs into concrete path names, for instance. Multiple servlets can
also share data, making it easy to implement database connection pooling and similar resource-
sharing optimizations. Servlets can also maintain information from request to request,
simplifying techniques like session tracking and caching of previous computations.
75
Portable
Servlets are written in the Java programming language and follow a standard API. Servlets are
supported directly or by a plug-in on virtually every major Web server. Consequently, servlets
written for, say, Macromedia Run can run virtually unchanged on Apache Tomcat, Microsoft
Internet Information Server (with a separate plug-in), IBM Web Sphere, planet Enterprise
Server, Oracle9i AS, or Star Nine Webster. They are part of the Java 2 Platform, Enterprise
Edition (J2EE; see http://java.sun.com/j2ee/), so industry support for servlets is becoming even
more pervasive.
Inexpensive
A number of free or very inexpensive Web servers are good for development use or deployment
of low- or medium-volume Web sites. Thus, with servlets and JSP you can start with a free or
inexpensive server and migrate to more expensive servers with high-performance capabilities or
advanced administration utilities only after your project meets initial success. This is in contrast
to many of the other CGI alternatives, which require a significant initial investment for the
purchase of a proprietary package. Price and portability are somewhat connected. For example,
Marty tries to keep track of the countries of readers that send him questions by email. India was
near the top of the list, probably #2 behind the U.S. Marty also taught one of his JSP and servlet
training courses (see http://courses.coreservlets.com/) in Manila, and there was great interest in
servlet and JSP technology there. Now, why are India and the Philippines both so interested? We
surmise that the answer is twofold. First, both countries have large pools of well-educated
software developers.
Second both countries have (or had, at that time) highly unfavorable currency exchange
rates against the U.S. dollar. So, buying a special-purpose Web server from a U.S. company
consumed a large part of early project funds. But, with servlets and JSP, they could start with a
free server: Apache Tomcat (either standalone, embedded in the regular Apache Web server, or
embedded in Microsoft IIS). Once the project starts to become successful, they could move to a
server like Caucho Resin that had higher performance and easier administration but that is not
free. But none of their servlets or JSP pages have to be rewritten. If their project becomes even
larger, they might want to move to a distributed (clustered) environment. No problem: they could
move to Macromedia Run Professional, which supports distributed applications (Web farms).
Again, none of their servlets or JSP pages have to be rewritten. If the project becomes quite large
76
and complex, they might want to use Enterprise JavaBeans (EJB) to encapsulate their business
logic. So, they might switch to BEA Web Logic or Oracle9i AS. Again, none of their servlets
or JSP pages have to be rewritten. Finally, if their project becomes even bigger, they might move
it off of their Linux box and onto an IBM mainframe running IBM Web- Sphere. But once again,
none of their servlets or JSP pages have to be rewritten
Secure
One of the main sources of vulnerabilities in traditional CGI stems from the fact that the
programs are often executed by general-purpose operating system shells. So, the CGI
programmer must be careful to filter out characters such as backquotes and semicolons that are
treated specially by the shell. Implementing this precaution is harder than one might think, and
weaknesses stemming from this problem are constantly being uncovered in widely used CGI
libraries. A second source of problems is the fact that some CGI programs are processed by
languages that do not automatically check array or string bounds. For example, in C and C++ it
is perfectly legal to allocate a 100-element array and then write into the 999th “element,” which
is really some random part of program memory. So, programmers who forget to perform this
check open up their system to deliberate or accidental buffer overflow attacks. Servlets suffer
from neither of these problems. Even if a servlet executes a system call (e.g., with Runtime. Exec
or JNI) to invoke a program on the local operating system, it does not use a shell to do so. And,
of course, array bounds checking and other memory protection features are a central part of the
Java programming language.
Mainstream
There are a lot of good technologies out there. But if vendors don’t support them and developers
don’t know how to use them, what good are they? Servlet and JSP technology is supported by
servers from Apache, Oracle, IBM, Sybase, BEA, Macromedia, Caucho, Sun/planet, New
Atlanta, ATG, Fujitsu, Ultras, Silver stream, the World Wide Web Consortium (W3C), and
many others. Several low-cost plugins add support to Microsoft IIS and Zeus as well. They run
on Windows, Unix/Linux, Maces, VMS, and IBM mainframe operating systems. They are the
single most popular application of the Java programming language. They are arguably the most
popular choice for developing medium to large Web applications. They are used by the airline
77
industry (most United Airlines and Delta Airlines Web sites), e-commerce (ofoto.com), online
banking (First USA Bank, Blanco Popular de Puerto Rico), Web search engines/portals
(excite.com), large financial sites (American Century Investments), and hundreds of other sites
that you visit every day. Of course, popularity alone is no proof of good technology. Numerous
counter-examples abound. But our point is that you are not experimenting with a
new and unproven technology when you work with server-side Java.
78
7. System Screens
79
80
81
82
LOGIN TO DATA OWNER
83
84
UPLOAD FILE TO THE CLOUD SERVER
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
User:-
118
119
120
121
122
123
124
125
126
127
128
8. System Testing
8.1 Testing Methodologies
Testing is the process of finding differences between the expected behavior
specified by system models and the observed behavior implemented system. From modeling
point of view , testing is the attempt of falsification of the system with respect to the system
models. The goal of testing is to design tests that exercise defects in the system and to reveal
problems.
The process of executing a program with intent of finding errors is called testing. During testing ,
the program to be tested is executed with a set of test cases , and the output of the program for
the test cases is evaluated to determine if the program is performing as expected . Testing forms
the first step in determining the errors in the program. The success of testing in revealing errors
in program depends critically on test cases.
129
UNIT TESTING
UNUNI
MODULE
Component SUB-SYSTEM
SYSTEM TESTING
Integration Testing
ACCEPTANCE
User Testing
Testing is the process of finding difference between the expected behavior specified by system
models and the observed behavior of the implemented system.
Unit Testing
Unit testing focuses on the building blocks of the software system, that is, objects and sub system
. There are three motivations behind focusing on components. First, unit testing reduces the
complexity of the overall tests activities, allowing us to focus on smaller units of the system.
Second , unit testing makes it easier to pinpoint and correct faults given that few components are
involved in this test . Third , Unit testing allows parallelism in the testing activities , that is each
component can be tested independently of one another . Hence the goal is to test the internal
logic of the module.
Integration Testing
In the integration testing, many test modules are combined into sub systems , which are then
tested . The goal here is to see if the modules can be integrated properly, the emphasis being on
testing module interaction.
After structural testing and functional testing we get error free modules. These modules are to be
integrated to get the required results of the system. After checking a module, another module is
tested and is integrated with the previous module. After the integration, the test cases are
generated and the results are tested.
System Testing
In system testing the entire software is tested . The reference document for this process is the
requirement document and the goal is to see whether the software meets its requirements. The
system was tested for various test cases with various inputs.
Acceptance Testing
Acceptance testing is sometimes performed with realistic data of the client to demonstrate that
131
the software is working satisfactory. Testing here focus on the external behavior of the system ,
the internal logic of the program is not emphasized . In acceptance testing the system is tested for
various inputs.
8.3 Types of Testing
1. Black box or functional testing
2. White box testing or structural testing
As shown in the following figure of Black box testing , we are not thinking of the internal
workings , just we think about
What is the output to our system?
What is the output for given input to our system?
Input
? Output
132
The Black box is an imaginary box that hides its internal workings
INTERNAL
Input Output
WORKING
Test Data:
Here all test cases that are used for the system testing are specified. The goal is to test the
different functional requirements specified in Software Requirements Specifications (SRS)
document.
133
Unit Testing:
Each individual module has been tested against the requirement with some test data.
Test Report:
The module is working properly provided the user has to enter information. All data entry forms
have tested with specified test cases and all data entry forms are working properly.
Error Report:
If the user does not enter data in specified order then the user will be prompted with error
messages. Error handling was done to handle the expected and unexpected errors.
A Test case is a set of input data and expected results that exercises a component
with the purpose of causing failure and detecting faults .test case is an explicit set of instructions
designed to detect a particular class of defect in a software system , by bringing about a failure .
A Test case can give rise to many tests.
134
Test Case 1:
Test Steps
135
Pwd02 Validate enter Password Login success
To verify that
Password with special full or an error
Password on
characters(say ! message
login page
@hi&*P) Login “Invalid Login
must be allow
Name and click or Password”
special
Submit button must be
characters
displayed
136
Registration Page Test Case
Test Steps
138
9. Conclusion and Future Enhancements
9.1 Conclusion:
we motivate and solve the problem of securemultikeyword top-k retrieval over encrypted cloud
data.We define similarity relevance and scheme robustness.Based on OPE invisibly leaking
sensitive information, wedevise a server-side ranking SSE scheme. whichfulfills the security
requirements ofmultikeyword top-k retrieval over the encrypted clouddata. By security analysis,
we show that the proposedscheme guarantees data privacy
139
REFERENCES:
[1] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A.Konwinski, G. Lee, D. Patterson, A.
Rabkin, and M. Zaharia, “AView of Cloud Computing,” Comm. ACM, vol. 53, no. 4, pp. 50-
[2]M.Arrington,“GmailDisaster:ReportsofMassEmailDeletions,”http://www.techcrunch.com/
2006/12/28/gmail-disasterreportsof-mass-email-deletions/, Dec. 2006.
[3] Amazon.com, “Amazon s3 AvailabilityEvent:July20,2008,”http://status.aws.amazon.com/s3-
20080720.html, 2008.
[4] RAWA News, “Massive Information Leak Shakes Washingtonover Afghan
War,”http://www.rawa.org/temp/runews/2010/
08/20/massive-information-leak-shakes-washington-overafghan-war.html, 2010.
[5] AHN, “Romney Hits Obama for Security Information
Leakage,”http://gantdaily.com/2012/07/25/romney-hits-obama-forsecurity-information-leakage/,
2012.
[6] Cloud Security Alliance, “Top Threats to Cloud Computing,”http://www.cloudsecurity
alliance.org, 2010.
[7] C. Leslie, “NSA Has Massive Database of Americans’
PhoneCalls,”http://usatoday30.usatoday.com/news/washington/2006-05-10/, 2013.
[8] R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky, “SearchableSymmetric Encryption:
Improved Definitions and Efficient Constructions,”Proc. ACM 13th Conf. Computer and Comm.
Security(CCS), 2006.
[9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword Search over
Encrypted Cloud Data,” Proc. IEEE 30th Int’lConf. Distributed Computing Systems (ICDCS),
2010.
[10] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-kRetrieval from a
Confidential Index,” Proc. 12th Int’l Conf.Extending Database Technology: Advances in
Database Technology(EDBT), 2009.
[11] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully
Homomorphic Encryption over the Integers,” Proc. 29th Ann. Int’l
Conf. Theory and Applications of Cryptographic Techniques, H. Gilbert,
pp. 24-43, 2010.
140
[12] M. Perc, “Evolution of the Most Common English Words and
Phrases over the Centuries,” J. Royal Soc. Interface, 2012.
[13] O. Regev, “New Lattice-Based Cryptographic Constructions,”
J. ACM, vol. 51, no. 6, pp. 899-942, 2004.
[14] N. Howgrave-Graham, “Approximate Integer Common Divisors,”
Proc. Revised Papers from Int’l Conf. Cryptography and Lattices
(CaLC’ 01), pp. 51-66, 2001.
[15] "NSF Research Awards Abstracts 1990-2003," http://kdd.ics.uci.
edu/databases/nsfabs/nsfawards.html, 2013.
141