Towards Secure Multi-Keyword Top-K Retrieval Documentation

Towards Secure Multi-Keyword Top-k Retrieval
Over Encrypted Cloud Data
1
Abstract:
Cloud computing has emerging as a promising pattern for data outsourcing and highqualitydata
services. Cloud computing provide the services for on demand users. On demand services
distribution perform with outsourcing operations. However, concerns of sensitive information on
cloud potentiallycause privacy problems. Data encryption protects data security to some extent,
butat the cost of compromised efficiency. Searchable symmetric encryption (SSE)
allowsretrieval of encrypted data over cloud. In this paper, we focus on addressing data
privacyissues using searchable symmetric encryption (SSE). For the first time, we formulate
theprivacy issue from the aspect of similarity relevance and scheme robustness. We observethat
server-side ranking based on order-preserving encryption (OPE) inevitably leaks dataprivacy. To
eliminate the leakage, we propose a two-round searchable encryption (TRSE)scheme that
supports top-k multi-keyword retrieval. In TRSE, we employ a vector spacemodel and
homomorphic encryption. The vector space model helps to provide sufficientsearch accuracy,
and the encryption enables users to involve in the rankingwhile the majority of computing work
is done on the server side by operations only oncipher text. As a result, information leakage can
be eliminated and data security is ensured.Thorough security and performance analysis show that
the proposed scheme guarantees high security and practical efficiency.
2
LIST OF CONTENTS Page No
1. Introduction 6
1.1 Purpose 6
1.2 Scope 6
1.3 Motivation 7
1.3.1 Definitions 7
1.3.2 Abbreviations 7
1.3.3 Model Diagrams 8
1.4 Overview 8
2. Literature Survey 9
2.1 Introduction
2.2 History
2.3 Purpose
2.4 Requirements
2.5 TechnologyUsed
3. System Analysis 19
3.1 Existing System 24
3.1.1 Drawbacks 24
3.2 Problem statement 24
3
3.3 Proposed System 24
3.3.1 Advantages 25
3.4 Feasibility Study 27
3.4.1 Economic Feasibility 27
3.4.2 Operational Feasibility 27
3.4.3 Technical Feasibility 28
3.5 Algorithm
4. System Requirements Specification 31
4.1 Introduction 31
4.2 Purpose 32
4.3 Functional Requirements 32
4.4 Non Functional Requirements 34
4.5 Hardware Requirements 34
4.6 Software Requirements 34
5. System Design 35
5.1 Introducton 29
5.2 System Model 36
5.3 Scenarios 37
5.4 DFD Diagrams 58
6. Implementation 70
6.1 Introduction 70
4
7. System Screens 80
8. SystemTesting 129
8.1 Testing Methodologies 129
8.2 Testing activities 130
8.3 Types of Testing 132
8.4 Test Plan 133
8.5 Test cases 134
9. Conclusion and Future Enhancements 139
9.1 Conclusion 139
9.2 Scope for Future Enhancement 139
10. References 140
5
INTRODUCTION
Cloud computing provide the services for on demand users. On demand services
distribution perform with outsourcing operations. Before exchanges the data file must it encrypt
and store in cloud server. Those files are gets the privacy and distributes the resources with
effective solutions. Present traditional applications using keyword search find out the results with
encrypted files. It can show the result like huge amount of files. These search results are not
utilized efficiently in user’s side. In this project we are ready to solve all the problems and
introduce secure keyword search mechanism. Secure keyword display the top list of files with
fewer results in output. Using fewer results increases the file retrieval accuracy and reduces the
communication overhead. All list of files here only top and rank list. Secure Keyword Search
starts in ranked list files only. Ranked list files are identify with relevance score and statistical
measure. These kinds of results are shows as a good and strong security files generation process.
Here we are show the good performance comparison in between of previous system to present
system. We are proved proposed system is the best solution in implementation. That is called as
a ranked keyword search.
1.1 Purpose
Cloud computing provide the services for on demand users. On demand services distribution
perform with outsourcing operations. We focus on addressing data privacyissues using
searchable symmetric encryption. Those files are gets the privacy and distributes the resources
with effective solutions.
1.2 Scope
Those files are gets the privacy and distributes the resources with effective solutions. Present
traditional applications using keyword search find out the results with encrypted files. It can
show the result like huge amount of files.These search results are not utilized efficiently in user’s
side. In this project we are ready to solve all the problems and introduce secure keyword search
mechanism.
6
1.3 Motivation
Ranked files are gets the privacy and distributes the resources with effective solutions. Present
traditional applications using keyword search find out the results with encrypted files. It can
show the result like huge amount of files. These search results are not utilized efficiently in
user’s side. In this project we are ready to solve all the problems and introduce secure keyword
search mechanism.
1.3.1 Definition;
1. For the first time, we define the problem of secure ranked keyword search over encrypted
cloud data, and provide such an effective protocol, which fulfills the secure ranked search
functionality with little relevance score information leakage against keyword privacy.
2. Thorough security analysis shows that our ranked searchable symmetric encryption
scheme indeed enjoys “as-strong-as-possible” security guarantee compared to previous
SSE schemes.
3. We investigate the practical considerations and enhancements of our ranked search
mechanism, including the efficient support of relevance score dynamics, the
authentication of ranked search results, and the reversibility of our proposed one-to-many
order-preserving mapping technique.
4. Extensive experimental results demonstrate the effectiveness and efficiency of the
proposed solution.
1.3.2 Abbreviations:
Searchable symmetric encryption (SSE)
Order-preserving encryption (OPE)
a two-round searchable encryption (TRSE)
Information retrieval (IR)
Term frequency (tf)
Inverse Document Frequency (idf)
7
1.3.3 Module Diagram:
1.4 Overview
Section1. In this section we have discussion Introduction, Purpose, Scope and Motivation
Section2. In this section we will discussion Literature survey and Research Methodologies.
Section3. Inthis section we will discussion,Fundamental Concepts on Data Mining, Existing

Techniques, and Proposed Techniques and Algorithm
Section4. Inthis section we will discussion System Analysis, Module Description and Feasibility
Study.
Section5. In this section we will discussion System Requirements Specification.
Section6. In this section we will discussion System Design (UML, DFD, and ER Diagram).
Section7. In this section we will discussion Implementation and Technology Description.
Section8. In this section we will discussionSystem Testing and Result Analysis.
Section9. In this section we will discussionConclusion and Future Enhancement.
8
Section10. In this section we will discussionReferences.
2. Literature Survey
Cloud computing has recently emerged as a new platform for deploying, managing, and
provisioning large-scale services through an Internet-based infrastructure. Successful examples
include Amazon EC2, Google App Engine, and Microsoft Azure . As a result, hosting databases
in the cloud has become a promising solution for Database-as-a-Service (DaaS) and Web 2.0
applications. In the cloud computing model, the data owner outsources both the data and
querying services to the cloud. The data are private assets of the data owner and should be
protected against bthe cloud and querying client; on the other hand, the query might disclose
sensitive information of the client and should be protected against the cloud and data owner.
Therefore, a vital concern in cloud computing is to protect both data privacy and query privacy
among the data owner, the client, and the cloud. The social networking service is one of the
sectors that witnesssuch rising concerns. For example, in Fig. 1 user Cindy wants to search an
online dating site for friends who share with her similar backgrounds (e.g., age, education, home
address). While the site or the data cloud should not disclose to Cindy personal details of any
user, especially those sensitive ones (e.g. home address), Cindy should not disclose the query
that involves her own details to the site or the cloud, either. More critical examples exist in
business sectors, where queries may reveal confidential business intelligence. For example, a
retail business plans to open a branch in a district. To calculate the target customer base, it needs
to query the demographic data of that district, which the data owner has outsourced to a data
cloud. While personal details in the Mutual Privacy Protection in Online Friend Matching
demographic data should not be disclosed to the outsourcing cloud or the business, the district
name in that query should not be disclosed to the cloud or data owner, either. It is also noted that
the cloud computing model worsens the consequence of privacy breaches in the above scenarios
as a single cloud may host querying services for many data owners. For example, two queries
from the same user, one on local clinic directory and another on anti-diabetic drugs, together
give a higher confidence that the user is probably suffering from diabetes. All the above concerns
call for a queryprocessing model that preserves both data privacy and query privacy among the
data owner, the client, and the cloud. The data owner should protect its data privacy, and does
not reveal any information beyond what the query result can imply. On the other hand, the client
9
should protect its query privacy so that the data owner and the cloud know nothing about the
query, and is therefore unable to infer any information about the client. Unfortunately, existing
privacy-preserving query processing solutions are not sufficient to solve this new problem
arising in the cloud model. Most research work in the literature addresses data privacy or query
privacy separately. For example, generalization techniques have been proposed to protect data
privacy by hiding quasi-identifier attributes and avoiding the disclosure of sensitive information
Similar techniques are proposed for query privacy on both relational data and spatial data. Only
very few, such as the Casper framework consider data and query privacy as a whole.
Furthermore, generalization-based solutions like the Casper still disclose the data or query in a
coarser and imprecise form. Not much research work addresses the unconditional privacy
required for this problem. Although some encryption schemes are proposed to protect the data
hosted on the outsourcing server they cannot be adopted in this problem for several reasons.
First, accurate query processing on encrypted data is difficult, if not impossible at all. Most
existing encryption schemes only support some specific queries. For example, space
transformation (e.g., space filling curve) used in [20] only supports approximation queries as it
cannot preserve the accurate distances in the original space. Second, even though suitable
encryptions are found for these queries, they become flawed when applied tour problem, as these
encryptions are not designed for mutual privacy protection in the first place. In particular, to
evaluate the query on the encrypted data, the client must encrypt the query by the same scheme
and send it to the outsourcing server, who may then forward it to the data owner,wherethe query
can be decrypted by her encryption parameters. Third, some encryptions or transformations are
shown to beVulnerable to certain security attacks. For example, distance preservingspace
transformations are vulnerable to principalComponentanalysis.In spite of the insufficiency of
these prior studies for our problem, they show us that a secure framework and an alternate
encryption scheme are both indispensable. In this paper, we propose a holistic and efficient
solution that is based on Privacy Homomorphism (PH). PHs are encryption transformations
which map a set of operations on clear text to another set of operations on cipher text. In essence,
PH enables complex computations (such as distances) based solely on ciphertext, without
decryption. We integrate a provably secure PH seamlessly with a generic index structure to
develop a novel query processing framework. It is efficient and can be applied to any multi-level
tree index. We address several challenges in this framework. First, an index consists of= multiple
10
nodes, and query processing on the index involves traversing these nodes. The cloud or data
owner should not be able to trace the access pattern and hence get any clue of the query. We
propose a client-lead processing paradigm that eliminates the disclosure of the query to any other
party. Second, to evaluate various types of complex queries, such ask NN and other distance-
based queries, a comprehensive setof client-cloud protocols must be devised to work together
with a PH that supports most arithmetic operations. Third, we prove the security and analyze the
complexity of the proposed algorithms and protocols. In particular, we present several
optimization techniques to improve the protocol efficiency and also show their privacy
implications. To summarize, our contributions in this paper are as follows:
• To the best of our knowledge, this is the first work that is dedicated to mutual privacy
protection for complex query processing over large-scale, indexed data in a cloud environment.
• We present a general index traversal framework that\ accommodates any multi-level index. The
framework can resist the index trace attempt of the cloud during query processing. Based on this
framework, we present a set of protocols to process typical distance-based queries.• We
thoroughly analyze the security and complexity of the proposed framework and protocols. In
particular, we present several optimization techniques to improve the protocol efficiency.
• An extensive set of experiments are conducted to evaluate the actual performance of our basic
and optimizedtechniques. The rest of the paper is organized as follows. Section II reviews
existing work on privacy-preserving query processing on outsourced data. Section III formulates
the problem and Section IV introduces ASM-PH, the privacy homomorphism used in this paper.
Section V overviews the secure processingframework, followed by detailed discussions on the
protocols in Sections VI and VII, with a focus on distance-based queries. Section VIII presents
three optimization techniques to improve the protocol efficiency. Section IX analyzes the
security and possible threats of our approach, followed by the performanceEvaluation in Section
X. Section XI concludes this paper with some future research directions. Our work falls into this
category but distinguishes itself from the others as being the first work that is dedicated to
mutual privacy protection. We propose a secure, encryption integrated framework that is suitable
for processing complexQueries over large-scale, indexed data. It is noteworthy that privacy-
preserving search on tree-structured data has been studied in some existing studies [6], [27], [2];
however, these works either consider one-way privacy or cannot provideunconditional privacy
11
guarantee. The third category considers a distributed environment where the data are partitioned
and outsourced to a set of independent and non-colluding outsourcing servers. The privacy-
preserving query processing requires a distributed and secure protocol to evaluate the result
without disclosing the data in each outsourcing server. The security foundation of such protocols
originates from secure multiparty computation (SMC), a cryptography problem that computes a
secure function from multiple participants in a distributed network .Privacy-preserving nearest
neighbor queries have been studied in this context for data mining. Shaneck et al. presented a
solution for point data on two parties. Qi and At allah improved this solution by applying a blind-
and-permute protocol, together with a secure selection and a multi-step kNNprotocol . For
approximate aggregation queries, Li et al. proposed randomized protocols based on probabilistic
computations to minimize the data disclosure [30]. Forvertically-partitioned data, privacy-
preserving top-k, kNNandjoin queries are studied . More recently, Gahnite et al. proposed a
private-information-retrieval (PIR) framework to evaluate ken queries in location-based
services . Thanks to oblivious transfer, a common primitive in SMC, the user can retrieve the
results without being pinpointed. However, solutions in the third category typically suffer from
heavy CPU, communication and storage overhead, as most SMC-based protocols do. As such,
they cannot scale well to large-scale databases. Introduce privacy homeomorphisms (PH), the
internal encryption scheme. PH are encryption transformations which map a set of operations on
clear text to another set of operations on cipher text. Formally, they are encryption functions by
applying the extended Euclidean algorithm. Obviously, this encryption is privacy
homomorphism under the operations defined by FŒ and F, because m = pq. However, this
encryption suffers from known-plaintext attacks [4],which means p and q could be found if a pair
of clear textandciphertext is known to an adversary. A. A Provably Secure Privacy
Homomorphism In , Domingo-Ferrer enhanced the above simple PH and proposed a provably
secure privacy homomorphism under thesame set of operations, i.e., modular addition,
subtractionand multiplication. We name it ASM-PH after its supported operations. It works as
follows. The public parameters are appositive integer t > 2 and a large integer m. t controls
howmany components a clear text is split into (t = 2 in the above). m should have many small
divisors (compared to t).Further, many integers smaller than m should be invertible
modulom..Similar to the simple PH, the set F of cipher textoperations are the corresponding
component wise operations in T .Finally, the encryption and decryption of this PH can be
12
described as follows..Encryption. Randomly split a clear text a ¸ ZmŒ intodegree are added up.
While ASM-PH can perform addition, subtraction and multiplication directly on the cipher texts,
these operations still cost considerable computations. Let + denote the costof a modular sum and
~ denote the cost of a modularmultiplication.1 Then the costs of these three operations are , and
respectively. It is alsonoteworthy that the multiplication will double the size of theciphertext
from t components to 2t components, with the firstcomponent being zero.As for the computation
cost of encryption, it is t~ as eachcomponent requires a modular multiplication with rj .2 Thecost
of decryption is similar, except that all components aresummed up in the end. As such, the cost is
It is noteworthy that the encryption will increases the sizeof the cleartext from 1 component to t
components. Sinceeach component is a positive integer in Zm, the size of theciphertext is thus t
E l(m), where l(m) denotes the number ofbits in m.In practice, the cost of modular addition is
dominated bythat of modular multiplication [9] and thus can be omittedwhen the latter presents.
Modular multiplication, especially forlarge modulus, also becomes extremely efficient (in the
magnitudeof 10.5 second) since the introduction of MontgomeryASM-PH is shown to be secure
against known-plain textattacks. Analytically, the size of the subset of keys thatare consistent
with n known cleartext- ciphertext pairs grows exponentially with s.n, where s. This means the
genuine key could be from an arbitrarily large key set. More security aspect of ASM-PH is
analyzed Distance Folding Both this and the next optimizations aim to reduce unnecessary
distance computations during the distance access fora single node. The key observation for the
distance foldingoptimization is from Eqn. 3, where the local distance is addedup from the
encrypted minimum square distance in eachdimension. Since distance is always positive, a
partial localdistance from a subset of all dimensions becomes a naturallower bound of the actual
local distance. This lower bound isparticularly useful because an R-tree node usually has tens
orhundreds of entries and some entries could be faraway fromthe query point, the complete local
distances of these entriesare not necessary and can be replaced by a lower bound whichserves the
same query processing purpose.We call this process“distance folding”. It is noteworthy that a
folded distance canalways be unfolded into a larger lower bound or even into theactual local
distance if necessary later on.The main challenge lies in when to stop adding up forthe partial
location distance — an immature stop leads toan aggressive lower bound that will probably be
unfoldedlater on. For distance range queries, the adding up can bestwhen the lower bound
reaches the distance threshold.ForkNN queries, however, the decision to fold a distance canrelate
13
to the processing status. Specifically, we keep only thetopmost L items in the priority queue as
“unfolded” whilethe distances of all rest items are folded as they are. Beforea folded distance is
to be inserted into the queue, it must beunfolded by at least one dimension or until it is no
longeramong the topmost L items. This strategy is called is “L-unfolded”, where L dictates how
aggressive the strategy is.Obviously queries; moreover, for the bestadaption to a specific dataset,
L can be runtime-adjustable byits performance as follows. When a query is complete, thesaving
can be calculated by counting all entries in the queuewhose distances are still folded, whereas the
overhead canbe calculated by counting the number of unfolding operationsduring processing. L
should be increased when the overheaddominates the saving, and vice versa.C. Entry
FoldingWhile distance can be folded by ignoring some dimensions,the same rationale can be
applied to the entries in an indexnode. The key observation is that, a node i usually has a
largenumber (typically over 100) of entries to fit into one diskpage, and it is unnecessary to
compute the local distance from to each entry. As such, some remote entries can be “folded”and
represented by a super entry. As there are fewer entriesin i, the computational cost of distance
access for i can besignificantly reduced. Fig. 5 illustrates the notion of super entry and entry
folding.The node i contains 10 objects or child entries which form twosuper entries a and b.
When i is accessed, it will be treatedas if there are only entries a and b. Later when entry a is
accessed (as it is closer to q than b), the same node i will beused and a will be unfolded into
entries 1-5. b, on the other hand, remains folded and entries 6-10 can be waived from the
distance access.The entry folding process is conducted offline at the data owner after the index is
built. For each node, the set of entriesare recursively partitioned into two subsets by
dimensionalaxes like the kd-tree index until each subset contains onlyone entry. Upon service
initialization, besides the shadow index, an auxiliary table is sent to the client that stores thesuper
entry information of all nodes (see Fig. 5). Each table record corresponds to a super entry and
has three fields: the node affiliation, minimum bounding box (encrypted by E), and the
associated entries. The table size depends on how we regulate the size of a super entry. Based on
its query demands, the client should initialize a desirable threshold W for the minimum number
of entries in each super entry. For example, W should be set higher than the largest k for k NN
queries. In Fig. 5, W = 5. Note that entry folding is not equivalent to reducing the fan out of the
index, as the latter is query independent. The tradeoff of entry folding lies in the wasted distance
computation of super entries that are unfolded later and the saved distance computation of folded
14
entries .In this paper, we study the problem of processing private queries on indexed data for
mutual privacy protection in a cloud environment .We present a secure index traversal
framework, based on which secure protocols are devised for classic types of queries. Through
theoretical proofs and performance evaluation, this approach is shown to be not only feasible, but
also efficient and robust under various parameter settings .We believe this work steps towards
practical applications of privacy homomorphism to secure query processing on large-scale,
structured datasets. As for future work, we plan to extend this work to other query types,
including top-k queries, skyline queries and multi-way joins. We also plan to investigate mutual
privacy protection for queries on senior unstructured datasets. In the current information era,
efficient and effective search capabilities for digital collections have become essential for
information management and knowledge discovery. Meanwhile, a growing number of
collections are professionally maintained in data centers and stored in encrypted form to limit
their access to only authorized users in order to protect\ confidentiality and privacy. Examples
include medical records, corporate proprietary communications, and sensitive government
documents. An emerging critical issue that Permission to make digital or hard copies of all or
part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
This section presents several representative scenarios where the secure search over a document
collection may take place. As shown in Fig. 1, the content owner, Olivia, uses the services of a
data center to store a large number of documents, as well as perform search and retrieval. Olivia
may also grant another user Alice the permission to search and retrieve her documents through
the data center. In this case, we refer to Olivia as the supervisor. In addition, to prevent leakage
of information against potential hackers’ break-in, the documents stored at the data center are
encrypted. The supervisor manages the content decryption keys and ma provide decryption
services upon Alice’s request. In the following, we examine a few application scenarios under
this framework. • Case 1: The content owner, Olivia, wants to search for some documents stored
at the data center. She has a limited bandwidth connection with the data center, and needs to
search through the encrypted content without downloading. The pre-processing is executed once
by Olivia, when she stores the documents, all in encrypted form, in the data center. The major
15
task of the pre-processing stage is to build a secure term frequency table and a secure inverse
document frequency table, so as to facilitate efficient and
Accurateinformation retrieval. For an unprotected term frequency table, both the search term and
its term frequency information are in plaintext. To
Protect the confidentiality of the search, we encrypt each of them in an appropriate way. As
shown in Fig. 2, a word w in a document first undergoes stemming to retain the word stem and to
remove the word ending. The stemmed word
S is then encrypted using an encryption function E and the word-key KwS - to obtain the
encrypted word w(e) Here the word-key is unique to each stemmed word and is obtained with a
key derivation function. w(e) S is further mapped to a particular row i in the term frequency
table, where the index i is established via a hashing function such that i = H(w(e)S ). The term
frequency information is collected by counting the number of occurrences of the stemmed word
in the jth document, and stored in the table entry This process is repeated to obtain the term
frequencies for all terms and documents, and the TF values are then further encrypted. In the
baseline model, the data center is only trusted with storing data. There is a single layer of
encryption To protect the term frequency information from both unauthorized users and from the
data center. We first encode each row of the term frequency table to minimize the required
storage. The encoded term frequency table denoted by is then encrypted to create, as TFwhere a
key i is used to encrypt the I th row of the term frequency table To increase security, the value of
I is unique for each row and is derived from the word-key KwS corresponding to the ith row.
Thus, compromising the key corresponding to one row does not compromising other rows of the
term frequency table. Since computing the relevance score requires the use of collection
frequency weight of a word as in the can be computed before-hand and encrypted using the same
word key as in the term frequency table.
Cipher text compression.
Even though the optimizations from above keep evaluated cipher texts at the same length as
originalciphertexts, the size of these cipher texts is still very large – ˜θ(λ5) bits under our
suggested parameters.We next show how to “compress”, or post-process the cipher texts, down
to (asymptotically)the size of an RSA modulus, reducing the communication complexity of our
16
scheme dramatically.The price of this optimization, however, is that we cannot evaluate anything
on these compressedciphertexts. Hence we can only use this compression technique on the final
output cipher texts, after all applications of the Evaluate algorithm have been completed. (This
technique also introduces another hardness assumption, similar to the φ-hiding assumption of
Caching et al. [3].)Roughly, we supplement the public key with the description of a group G and
an element g ∈Whose order is a multiple of the secret key p. Then, given the cipher text c from
our scheme, the compressed cipher text is simply.Note that so decrypting is done
The baseline model introduced in the previous section addressesthe scenarios where the content
owner makes a queryhimself/herself. In this section, we present an alternatescheme to enable a
search capability from a user other thanthe content owner. This scheme reduces the involvement
ofOlivia by shifting the task of computing the relevance scoreto the data center, while still
maintaining the confidentialityof the term-frequency information and the document content.To
remove the need for communications between thedata center and content owner during content
search, wemust be able to perform computations and ranking directlyon term-frequency data in
its encrypted form. We refer tothis searchable layer of encryption as the inner-layer
encryption,which is denoted by TF(s). Inner-layer encryptioncan be done via cryptographic tools
such as homomorphicencryption (HME) and order preserving encryption (OPE);the computation
of relevance score should be adapted accordinglyto support encrypted domain computation.
Weuse OPE in this paper to demonstrate the concept for secureranking of relevance. After the
inner-layer encryption,TF(s) is encoded to obtain, and further encrypted toC in the same way as
in the baseline scheme. Werefer to this second round of encryption as outer-layer
encryption,which prevents unauthorized users from accessingTF information.The indexing and
pre-processing stages of the proposedschemes are similar to the baseline model with an
additionalinner-layer encryption. When searching for a particularquery consisting of multiple
terms, in the collection, Alice first performs stemming andsends the stemmed words to the
content owner, Olivia, whochecks whether Alice has the required permission to searchfor the
query words. Upon verification, Olivia derives theword-keys from the master key and uses it to
encrypt thestemmed-words to obtain Thehash value of is then calculated and transmittedto Alice
who forwards it to the data center. Using the receivedhash values the data center searchesthe
protected term frequency table C and identifies therows corresponding to the query words,
without obtainingplaintext information about the query
17
In this section, we compare the performance of the baselinemodel and OPE in terms of security,
retrieval accuracy,and examine the tradeoffs involved in securing the term frequencyusing order
preserving encryption. We evaluate theretrieval accuracies of the secure search schemes on
theW3Ccollection, with the 59 queries used for the discussion searchin the enterprise track in the
2005 Text Retrieval Conference. Any document that is judged partiallyrelevant or relevant is
taken to be relevant in our test (i.e.conflating the top two judgementlevels. We study the In this
work, we develop a framework for confidentialitypreservingrank-ordered search in large scale
document collections.We explore techniques to securely rank-order thedocuments and extract the
most relevant document(s) froman encrypted collection based on the encrypted search
queries.We present several representative scenarios depending on thesecurity requirement; and
develop techniques to perform efficientsearch and retrieval in each case. The proposed
methodmaintains the confidentiality of the query as well as the contentof retrieved
documents.The techniques introduced in this work are first attemptsto bring together advanced
information retrieval capabilitiesand secure search capabilities. In addition to our focus
onsecuring indices, other important security issues include protectingcommunication links and
combating traffic analysis.These will need to be addressed in future work. Furtherinvestigations
of complete cryptographic modeling, efficientalgorithm design, and system evaluations can shine
light on
an improved balance between the security, efficiency, andaccuracy of search, leading to a wide
range of applications,such as searching information with hierarchical access control,and flexible
“e-discovery” practices for digital recordsin legal proceedings.
18
3. SYSTEM ANALYSIS
The Systems Development Life Cycle (SDLC), or Software Development Life Cycle in
systems engineering, information systems and software engineering, is the process of creating or
altering systems, and the models and methodologies that people use to develop these systems.
In software engineering the SDLC concept underpins many kinds of software development
methodologies. These methodologies form the framework for planning and controlling the
creation of an information system the software development process.
SOFTWARE MODEL OR ARCHITECTURE ANALYSIS:
Structured project management techniques (such as an SDLC) enhance

management’s control over projects by dividing complex tasks into manageable sections. A
software life cycle model is either a descriptive or prescriptive characterization of how software
is or should be developed. But none of the SDLC models discuss the key issues like Change
management, Incident management and Release management processes within the SDLC
process, but, it is addressed in the overall project management. In the proposed hypothetical
model, the concept of user-developer interaction in the conventional SDLC model has been
converted into a three dimensional model which comprises of the user, owner and the developer.
In the proposed hypothetical model, the concept of user-developer interaction in the conventional
SDLC model has been converted into a three dimensional model which comprises of the user,
owner and the developer. The ―one size fits all‖ approach to applying SDLC methodologies is
no longer appropriate. We have made an attempt to address the above mentioned defects by
using a new hypothetical model for SDLC described elsewhere. The drawback of addressing
these management processes under the overall project management is missing of key technical
issues pertaining to software development process that is, these issues are talked in the project
management at the surface level but not at the ground level.
19
WHAT IS SDLC?
A software cycle deals with various parts and phases from planning to testing
and deploying software. All these activities are carried out in different ways, as per the needs.
Each way is known as a Software Development Lifecycle Model (SDLC).A software life cycle
model is either a descriptive or prescriptive characterization of how software is or should be
developed. A descriptive model describes the history of how a particular software system was
developed. Descriptive models may be used as the basis for understanding and improving
software development processes or for building empirically grounded prescriptive models.
SDLC models * The Linear model (Waterfall) - Separate and distinct phases of specification
and development. - All activities in linear fashion. - Next phase starts only when first one is
complete. * Evolutionary development - Specification and development are interleaved (Spiral,
incremental, prototype based, Rapid Application development). - Incremental Model (Waterfall
in iteration), - RAD(Rapid Application Development) - Focus is on developing quality product in
less time, - Spiral Model - We start from smaller module and keeps on building it like a spiral. It
is also called Component based development. * Formal systems development - A mathematical
system model is formally transformed to an implementation. * Agile Methods. - Inducing
flexibility into development. * Reuse-based development - The system is assembled from
existing components.
The General Model
Software life cycle models describe phases of the software cycle and the order in which those
phases are executed. There are tons of models, and many companies adopt their own, but all
have very similar patterns. Each phase produces deliverables required by the next phase in the
life cycle. Requirements are translated into design. Code is produced during implementation that
is driven by the design. Testing verifies the deliverable of the implementation phase against
requirements.
SDLC Methodology:
Spiral Model
The spiral model is similar to the incremental model, with more emphases placed on risk
analysis. The spiral model has four phases: Planning, Risk Analysis, Engineering and
20
Evaluation. A\ software project repeatedly passes through these phases in iterations (called
Spirals in this model). The baseline spiral, starting in the planning phase, requirements is
gathered and risk is assessed. Each subsequent spirals builds on the baseline spiral.
Requirements are gathered during the planning phase. In the risk analysis phase, a process is
undertaken to identify risk and alternate solutions. A prototype is produced at the end of the
risk analysis phase. Software is produced in the engineering phase, along with testing at
the end of the phase. The evaluation phase allows the customer to evaluate the output of the
project to date before the project continues to the next spiral. In the spiral model, the angular
component represents progress, and the radius of the spiral represents cost. Spiral Life Cycle
Model.
This document play a vital role in the development of life cycle (SDLC) as it describes the
complete requirement of the system. It means for use by developers and will be the basic during
testing phase. Any changes made to the requirements in the future will have to go through
formal change approval process.
SPIRAL MODEL was defined by Barry Boehm in his 1988 article, “A spiral Model of
Software Development and Enhancement. This model was not the first model to discuss
iterative development, but it was the first model to explain why the iteration models.
As originally envisioned, the iterations were typically 6 months to 2 years long. Each phase
starts with a design goal and ends with a client reviewing the progress thus far. Analysis and
engineering efforts are applied at each phase of the project, with an eye toward the end goal of
the project.
The steps for Spiral Model can be generalized as follows:
 The new system requirements are defined in as much details as possible. This usually
involves interviewing a number of users representing all the external or internal users
and other aspects of the existing system.
 A preliminary design is created for the new system.
21
 A first prototype of the new system is constructed from the preliminary design. This
is usually a scaled-down system, and represents an approximation of the
characteristics of the final product.
 A second prototype is evolved by a fourfold procedure:
1. Evaluating the first prototype in terms of its strengths, weakness, and risks.
2. Defining the requirements of the second prototype.
3. Planning an designing the second prototype.
4. Constructing and testing the second prototype.
 At the customer option, the entire project can be aborted if the risk is deemed too
great. Risk factors might involved development cost overruns, operating-cost
miscalculation, or any other factor that could, in the customer’s judgment, result in a
less-than-satisfactory final product.
 The existing prototype is evaluated in the same manner as was the previous prototype,
and if necessary, another prototype is developed from it according to the fourfold
procedure outlined above.
 The preceding steps are iterated until the customer is satisfied that the refined
prototype represents the final product desired.
 The final system is constructed, based on the refined prototype.
 The final system is thoroughly evaluated and tested. Routine maintenance is carried
on a continuing basis to prevent large scale failures and to minimize down time.
22
Fig -Spiral Model
Advantages
 High amount of risk analysis

 Good for large and mission-critical projects.
 Software is produced early in the software life cycle.
23
3.1 Existing System:
Now-a-days cloud servers are gets the high storage files. Here select and processing the files gets
the burden problems. Whenever large numbers of files are available in cloud server under
encryption some problems are generated. Total files are not encrypted. That’s here there is no
sufficient privacy and security in outsourcing. Some unauthorized users are entering and corrupt
the content of information.
Previously user are selects the files in interesting manner as a plain text files. This is
failing under access the files. There is no perfect decryption technique under access the files of
representation process. User are suffers with present searching technique.
Drawbacks
1. For each search request, users without pre-knowledge of the encrypted cloud data
have to go through every retrieved file in order to find ones most matching their
interest, which demands possibly large amount of postprocessing over head.
2. Invariably sending back all files solely based on presence/absence of the keyword
further incurs large unnecessary network traffic, which is absolutely undesirable in
today’s pay-as-you-use cloud paradigm.
3. It is ineffective under searching
3.1.2 Problem statement

1. It is ineffective under searching
2. The leakage problem on the cloud data.
3.2 Proposed System:
We introduce the concepts of similarity relevance and scheme robustness toformulate the privacy
issue in searchable encryption schemes, and then solve the insecurityproblem by proposing a
two-round searchable encryption (TRSE) scheme. Novel technologiesin the cryptography
community and information retrieval community are employed, includinghomomorphism
encryption and vector space model. In the proposed scheme, the majority ofcomputing work is
done on the cloud while the user takes part in ranking, which guarantees top-kmulti-keyword
retrieval over encrypted cloud data with high security and practical efficiency
24
Main Contribution:
We propose the concepts of similarity relevance and scheme robustness. We thus performthe
first attempt to formulate the privacy issue in searchable encryption, and we show
serversideranking based on order-preserving encryption (OPE) inevitably violates data privacy.
We propose a two-round searchable encryption (TRSE) scheme, which fulfills the securemulti-
keyword top-k retrieval over encrypted cloud data. Specifically, for the first time weemploy
relevance score to support multi-keyword top-k retrieval.
Thorough analysis on security demonstrates the proposed scheme guarantees high data privacy.
Furthermore, performance analysis and experimental results show that our scheme isefficient for
practical utilization.
3.3.Advantages
1. It can retrieve the results with less communication overhead.

2. It can provide the results with effective retrieval accuracy.
3. It can provide effective privacy and security application
4. To eliminate the leakage
Modules Description:
1. Entities in the cloud computing system:
2. RelevanceScore for multi-keyword
3. Spastically leakage and Vector specification Model
4. κ-similarity Relevance:
25
1 Entities in the cloud computing system:
Create three different entities. Those entities are data owner, user and cloud server. Data
owner starts the collection of files. All encrypted files of information store in cloud server.
User enters the secure searchable keyword. Automatically extract the related files, calculate
the index value. All results are displayed as a authentication files of content
2 RelevanceScore for multi-keyword:
Some of the multi-keyword searchable symmetric encryption schemes support only Boolean
Queries, i.e., a file either match or do not match a query. Considering the large numberof data
users and documents in the cloud, it is necessary to allow multi-keyword in the searchquery and
return documents in the order of their relevancy with the queried keywords.Scoring is a natural
way to weight the relevance.
3 Spastically leakage and Vector specification Model:
The weight of a single keyword on a file, we employ vector space model toscore a file on multi-
keyword. The vector space model is an algebraic model for representinga file as a vector. Each
dimension of the vector corresponds to a separate term, i.e., if a termoccurs in the file, its value
in the vector is non-zero, otherwise is zero. The vector spacemodel supports multi-term and non-
binary presentations Wedenote the possible information leakage with statistic leakage. There are
two possible statisticleakages, including term distribution and inter distribution. The term
distribution of term tis ts frequency distribution of scores on each file.
4.κ-similarity Relevance:
In order to avoid information leakage in server-side ranking schemes, a series of techniques have
been employed to flatten or transfer the distribution of relevance scores. These
approaches,however, only cover the distribution of individual term or file, ignoring the relevance
betweenthem and the violation of data privac
26
3.5 FEASIBILITY STUDY
Preliminary investigation examine project feasibility, the likelihood the system will
be useful to the organization. The main objective of the feasibility study is to test the Technical,
Operational and Economical feasibility for adding new modules and debugging old running
system. All system is feasible if they are unlimited resources and infinite time. There are aspects
in the feasibility study portion of the preliminary investigation:
 Technical Feasibility
 Operational Feasibility
 Economical Feasibility
4.5.1 ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs.
The system is economically feasible. It does not require any addition hardware or
software. Since the interface for this system is developed using the existing resources and
technologies available at NIC, There is nominal expenditure and economical feasibility for
certain.
4.5.2 OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned out into information system.
That will meet the organization’s operating requirements. Operational feasibility aspects of the
project are to be taken as an important part of the project implementation. Some of the important
issues raised are to test the operational feasibility of a project includes the following: -
 Is there sufficient support for the management from the users?

 Will the system be used and work properly if it is being developed and implemented?
 Will there be any resistance from the user that will undermine the possible application
benefits?
27
This system is targeted to be in accordance with the above-mentioned issues. Beforehand,
the management issues and user requirements have been taken into consideration. So there is no
question of resistance from the users that can undermine the possible application benefits.
The well-planned design would ensure the optimal utilization of the computer resources and
would help in the improvement of performance status.
4.5.3 TECHNICAL FEASIBILITY
The technical issue usually raised during the feasibility stage of the investigation includes
the following:
 Does the necessary technology exist to do what is suggested?

 Do the proposed equipments have the technical capacity to hold the data required to use the
new system?
 Will the proposed system provide adequate response to inquiries, regardless of the number or
location of users?
 Can the system be upgraded if developed?
 Are there technical guarantees of accuracy, reliability, ease of access and data security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure Implementation
System’. The current system developed is technically feasible. It is a web based user interface for
audit workflow at NIC-CSD. Thus it provides an easy access to the users. The database’s
purpose is to create, establish and maintain a workflow among various entities in order to
facilitate all concerned users in their various capacities or roles. Permission to the users would be
granted based on the roles specified. Therefore, it provides the technical guarantee of accuracy,
reliability and security. The software and hard requirements for the development of this project
are not many and are already available in-house at NIC or are available as free as open source.
The work for the project is done with the current equipment and existing software technology.
Necessary bandwidth exists for providing a fast feedback to the users irrespective of the number
of users using the system.
This paper introduces a new framework for confidentialitypreserving rank-ordered search and
retrieval over large documentcollections. The proposed framework not only
protectsdocument/query confidentiality against an outside intruder,but also prevents an untrusted
28
data center from learninginformation about the query and the document collection.We present
practical techniques for proper integration of relevancescoring methods and cryptographic
techniques, suchas order preserving encryption, to protect data collectionsand indices and
provide efficient and accurate search capabilitiesto securely rank-order documents in response to
aquery. Experimental results on the W3C collection showthat these techniques have comparable
performance to conventionalsearch systems designed for non-encrypted data interms of search
accuracy. The proposed methods thus formthe first steps to bring together advanced information
retrievaland secure search capabilities for a wide range of applicationsincluding managing data
in government and businessoperations, enabling scholarly study of sensitive data,and facilitating
the document discovery process in litigation.established cryptographic primitives. The
understandingsobtained from this exploration will pave ways to bring togetherresearchers from
information retrieval [1] and appliedcryptography to establish a bridge between these areas.To
accomplish our goals, we collect term frequency informationfor each document in the collection
to build indices,as in traditional retrieval systems for plaintext. We furthersecure these indices
that would otherwise reveal importantstatistical information about the collection to protectagainst
statistical attacks. During the search process, thequery terms are encrypted to prevent the
exposure of informationto the data center and other intruders, and toconfine the searching entity
to only make queries within anauthorized scope. Utilizing term frequencies and other
documentinformation, we apply cryptographic techniques suchas order-preserving encryption to
develop schemes that cansecurely compute relevance scores for each document, identifythe most
relevant documents, and reserve the right toscreen and release the full content of relevant
documents.The proposed framework has comparable performance toconventional searching
systems designed for non-encrypteddata in terms of search accuracy.The rest of this paper is
organized as follows. Relatedbackground and prior work are reviewed in .
There has been a considerable amount of prior work onalgorithms and data structures to support
information retrievalfor plaintext documents focussing on various issues,including efficient
representation [1] and effective ranking [3].In contrast, protection of sensitive information in the
documentcollection, the indices, and/or the queries has receivedmuch less attention until
recently. Some explorationof search in encrypted data and private information retrievalsystems
has been reported in . These techniques generallyinvolve high computational complexity in
search, orincur a considerable increase in storage to store specially encrypteddocuments.
29
Approaches to reduce search complexitywere introduced in at an expense of limited
searchcapabilities confined by a keyword list identified beforehand.The documents containing
some of the pre-identified keywordsare first found, and the keywords or the keywordindices are
encrypted in a way that facilitates search andretrieval. These existing techniques target simple
Booleansearches to identify the presence or absence of a term in anencrypted text. Much of the
existing work has not been appliedto large collections, and it is not clear whether it canbe easily
extended to more sophisticated relevance-rankedsearches.This section presents several
representative scenarios wherethe secure search over a document collection may take place.
As shown in Fig. 1, the content owner, Olivia, uses the servicesof a data center to store a large
number of documents,as well as perform search and retrieval. Olivia may alsogrant another user
Alice the permission to search and retrieveher documents through the data center. In this case,
we refer to Olivia as the supervisor. In addition, to prevent leakage of information against
potential hackers’ break-in,the documents stored at the data center are encrypted. Thesupervisor
manages the content decryption keys and mayprovide decryption services upon Alice’s request.
In the following,we examine a few application scenarios under thisframework.
30
4 System Requirements Specification
5.1 Introduction
A Software Requirements Specification (SRS) – a requirements specification for
a software system – is a complete description of the 31nalysing of a system to be developed. It
includes a set of use cases that describe all the interactions the users will have with the software.
In addition to use cases, the SRS also contains non-functional requirements. Non-functional
requirements are requirements which impose constraints on the design or implementation (such
as performance engineering requirements, quality standards, or design constraints).
System requirements specification: A structured collection of information that embodies the
requirements of a system. A business analyst, sometimes titled system analyst, is responsible for
31nalysing the business needs of their clients and stakeholders to help identify business problems
and propose solutions. Within the systems development life cycle domain, typically performs a
liaison function between the business side of an enterprise and the information technology
department or external service providers. Projects are subject to three sorts of requirements:
 Business requirements describe in business terms what must be delivered or
accomplished to provide value.
 Product requirements describe properties of a system or product (which could be one of
several ways to accomplish a set of business requirements.)
 Process requirements describe activities performed by the developing organization. For
instance, process requirements could specify specific methodologies that must be
followed, and constraints that the organization must obey.
Product and process requirements are closely linked. Process requirements often specify the
activities that will be performed to satisfy a product requirement. For example, a maximum
development cost requirement (a process requirement) may be imposed to help achieve a
maximum sales price requirement (a product requirement); a requirement that the product be
maintainable (a Product requirement) often is addressed by imposing requirements to follow
particular development styles
31
5.2 PURPOSE
An systems engineering, a requirement can be a description of what a system must do, referred
to as a Functional Requirement. This type of requirement specifies something that the delivered
system must be able to do. Another type of requirement specifies something about the system
itself, and how well it performs its functions. Such requirements are often called Non-functional
requirements, or ‘performance requirements’ or ‘quality of service requirements.’ Examples of
such requirements include usability, availability, reliability, supportability, testability and
maintainability.
A collection of requirements define the characteristics or features of the desired system. A

‘good’ list of requirements as far as possible avoids saying how the system should implement the
requirements, leaving such decisions to the system designer. Specifying how the system should
be implemented is called “implementation bias” or “solution engineering”. However,
implementation constraints on the solution may validly be expressed by the future owner, for
example for required interfaces to external systems; for interoperability with other systems; and
for commonality (e.g. of user interfaces) with other owned products.
In software engineering, the same meanings of requirements apply, except that the focus of
interest is the software itself.
5.3 Functional Requirements:
In Functional requirements owner has its own functionalities likecloud server and user.Actually
owner sends encrypted files to cloud server and monitor the information in the side of cloud
server by the data owner. User retrieved datafrom cloud server with security guarantee.
32
*Data Owner Functional Requirements:
1.Dataowner First Uploading Files into cloud server
2. View the indexwords in the dataowner it means index of the files
3. view all the Keyword based index or identifiy the weight.
4. we have to provide the security to system by use of changepassword
5. View all users in the project
6. close the module or logout module
*CloudServerfunctional requirements:
1. what are all files exist in the cloudserver environment
2. what are the all containg files outsourced files .
3.VectorSpacemodel(TermFrequency of the files)..
4 .Stastical Leakage means(intermtermfrequency) and identification of attackers.
5. k-Similarity Relavance means identify the sequences and calculate the largest common
subsequences and draw the graph.
6.Change password
7. Logout
User functional requirements:
1. Login to user module
2. user will perform the searching mechanisism
3. he can verify the information about the individual information
33
4. Changepassword is mandatory to provide security
5. close the session or logout htre module.
5.4 Non Functional Requirements

The major non-functional Requirements of the system are as followsUsability
The system is designed with completely automated process hence there is no or less user
intervention.
Reliability
The system is more reliable because of the qualities that are inherited from the chosen platform
java. The code built by using java is more reliable.
Performance
This system is developing in the high level languages and using the advanced front-end and
back-end technologies it will give response to the end user on client system with in very less
time.
Supportability
The system is designed to be the cross platform supportable. The system is supported on a wide
range of hardware and any software platform, which is having JVM, built into the system.
5.5 Software Requirements:

Language : JDK (1.7.0)
Frontend : JSP and Servlet
Backend : Oracle10g
IDE : my eclipse 8.6
Operating System : windows XP
5.6 Hardware Requirements
Processor : Pentium IV
Hard Disk : 80GB
RAM : 2GB
34
5. System Design
5.1 Introduction
The purpose of the design phase is to plan a solution of the
problem specified by the requirement document. This phase is the first step in moving from the
problem domain to the solution domain. In other words, starting with what is needed, design
takes us toward how to satisfy the needs. The design of a system is perhaps the most critical
factor affection the quality of the software; it has a major impact on the later phase, particularly
testing, maintenance. The output of this phase is the design document. This document is similar
to a blueprint for the solution and is used later during implementation, testing and maintenance.
The design activity is often divided into two separate phases System Design and Detailed
Design.
System Design also called top-level design aims to identify the modules that should be in the
system, the specifications of these modules, and how they interact with each other to produce the
desired results. At the end of the system design all the major data structures, file formats, output
formats, and the major modules in the system and their specifications are decided.
During, Detailed Design, the internal logic of each of the modules specified in system
design is decided. During this phase, the details of the data of a module is usually specified in a
high-level design description language, which is independent of the target language in which the
software will eventually be implemented.
In system design the focus is on identifying the modules, where as during detailed design
the focus is on designing the logic for each of the modules. In other works, in system design the
attention is on what components are needed, while in detailed design how the components can be
implemented in software is the issue.
Design is concerned with identifying software components specifying relationships
among components. Specifying software structure and providing blue print for the document
phase. Modularity is one of the desirable properties of large systems. It implies that the system is
divided into several parts. In such a manner , the interaction between parts is minimal clearly
specified.
35
During the system design activities , Developers bridge the gap between the requirements
specification , produced during requirements elicitation and analysis , and the system that is
delivered to the user.
Design is the place where the quality is fostered in development . Software design is a
process through which requirements are translated into a representation of software.
5.2 System Model

Introduction to UML
The unified ModelingLanguage (UML) is a standard language for writing software blueprints.
The UML may be used to visualize,specify , construct and document the artifacts of software-
intensive system.
The goal of UML is to provide a standard notation that can be used by all object - oriented
methods and to select and integrate the best elements .UML is itself does not prescribe or advice
on how to use that notation in a software development process or as part of an object - design
methodology. The UML is more than just bunch of graphical symbols. Rather , behind each
symbol in the UML notation is well-defined semantics.
The system development focuses on three different models of the system.
 Functional model
 Object model
 Dynamic model
Functional model in UML is represented with use case diagrams , describing the functionality
of the system from user point of view.
Object model in UML is represented with class diagrams , describing the structure of the system
in terms of objects , attributes , associations and operations.
Dynamic model in UML is represented with sequence diagrams , start chart diagrams and
activity diagrams describing the internal behaviour of the system.
36
5.3 Scenarios
A Use Case is an abstraction that all describes all possible scenarios involving the described
functionality . A scenario is an instance of a use case describing a concrete set of actions.
 The name of the scenario enables us to refer it ambiguously. The name of
scenario is underlined to indicate it is an instance.
 The Participating actor instance field indicates which actor instance are
involved in this scenario. Actor instance also have underlined names.
 The Flow of Events of scenario describe the sequence of events step by step.
5.3.1 Use Case Model

Use case diagrams represent the functionality of the system from a user point of view. A Use
case describes a function provided by the system that yields a visible result for an actor. an actor
describe any entity that interacts with the system. The identification of actors and use cases
results in the definition of the boundary of the system, which is , in differentiating the tasks
accomplished by the system and the tasks accomplished by its environment. The actors outside
the boundary of the system, where as the use cases are inside the boundary of the system
A Use case contains all the events that can occur between an actor and a set of scenarios that
explains the interactions as sequence of happenings.
Actors
Actors represent external entities that interact with the system. An actor can be human or
external system.
Actor are not part of the system. They represent anyone or anything that interact with the system.
An Actor may
 Only input information to the system.
 Only receive information from the system.
 Input and receive information from to and from the system.
During this activity , developers indentify the actors involved in this system are:
User:
37
User is an actor who uses the system and who performs the operations like data classifications
and execution performance that are required for him.
Use Cases:
What is a UML Use Case Diagram?
Use case diagrams model the functionality of a system using actors and use cases. Use cases are
services or functions provided by the system to its users.
Basic Use Case Diagram Symbols and Notations
System
Draw your system's boundaries using a rectangle that contains use cases. Place actors outside the
system's boundaries.
Use Case
Draw use cases using ovals. Label with ovals with verbs that represent the system's functions.
38
Actors
Actors are the users of a system. When one system is the actor of another system, label the actor
system with the actor stereotype.
Relationships
Illustrate relationships between an actor and a use case with a simple line. For relationships
among use cases, use arrows labeled either "uses" or "extends." A "uses" relationship indicates
that one use case is needed by another in order to perform a task. An "extends" relationship
indicates alternative options under a certain use case.
39
Dataowner:
login
UploadFile
Indexwords
Dataowner
KeywordIndex
Security ChangePassword
RegisterUsers
Logout
40
Cloudserver:
login
OutsourceofFiles
CloudFilesInformation
IndexofFiles(i)
VectorSpaceModel
TermFrequency
termdistribution
S-Leakage or SearchPattern
CloudServer Identificationofattackers
Sequences
K-Similarity Relavance
LargestCommonSequences
OpenGraph
Security
ChangePassword
Logout
41
User:
login
Search
User Profile
ChangePassword
logout
42
5.3.2 Object model
Class Diagram
What is a UML Class Diagram?
Class diagrams are the backbone of almost every object-oriented method including UML. They
describe the static structure of a system.
Basic Class Diagram Symbols and Notations
Classes represent an abstraction of entities with common characteristics. Associations represent

the relationships between classes.
Illustrate classes with rectangles divided into compartments. Place the name of the class in the
first partition (centered, bolded, and capitalized), list the attributes in the second partition, and
write operations into the third.
Active Class
Active classes initiate and control the flow of activity, while passive classes store data and serve
other classes. Illustrate active classes with a thicker border.
Visibility
43
Use visibility markers to signify who can access the information contained within a class. Private
visibility hides information from anything outside the class partition. Public visibility allows all
other classes to view the marked information. Protected visibility allows child classes to access
information they inherited from a parent class. .
Associations
Associations represent static relationships between classes. Place association names above, on, or
below the association line. Use a filled arrow to indicate the direction of the relationship. Place
roles near the end of an association. Roles represent the way the two classes see each other.
Note: It's uncommon to name both the association and the class roles.
Multiplicity (Cardinality)
Place multiplicity notations near the ends of an association. These symbols indicate the number
of instances of one class linked to one instance of the other class. For example, one company will
have one or more employees, but each employee works for one company only.
44
Constraint
Place constraints inside curly braces {}.
Simple Constraint
Composition and Aggregation
Composition is a special type of aggregation that denotes a strong ownership between Class A,
the whole, and Class B, its part. Illustrate composition with a filled diamond. Use a hollow
diamond to represent a simple aggregation relationship, in which the "whole" class plays a more
important role than the "part" class, but the two classes are not dependent on each other. The
diamond end in both a composition and aggregation relationship points toward the "whole" class
or the aggregate
45
Generalization
Generalization is another name for inheritance or an "is a" relationship. It refers to a relationship
between two classes where one class is a specialized version of another. For example, Honda is a
type of car. So the class Honda would have a generalization relationship with the class car.
In real life coding examples, the difference between inheritance and aggregation can be
confusing. If you have an aggregation relationship, the aggregate (the whole) can access only the
PUBLIC functions of the part class. On the other hand, inheritance allows the inheriting class to
access both the PUBLIC and PROTECTED functions of the superclass.
46
LoginAction
UserRegistration UploadFile
EnterLoginid
EnterPersonalInfo File : file
Enter Password
Registration is successful() File uploaded successfully()
LoginisSuccessful()
CountAction ViewCloudInformation
StasticsAction
int : countid File : file
int : SingleKeywordcount
String : filename int : fileid
Weight of the Single Keyword()
Weight of the files() ViewInformation()
SearchAction
ViewResults
String : keyword
File : files
Enter the Keyword
View Searched Files()
Keyword Entered Successfully()
5.3.3 Dynamic model:
Sequence Diagram
Sequence diagrams describe interactions among classes in terms of an exchange of messages

over time.
Basic Sequence Diagram Symbols and Notations
Class roles
Class roles describe the way an object will behave in context. Use the UML object symbol to
illustrate class roles, but don't list object attributes.
47
Activation
Activation boxes represent the time an object needs to complete a task.
Messages
Messages are arrows that represent communication between objects. Use half-arrowed lines to
represent asynchronous messages. Asynchronous messages are sent from an object that will not
wait for a response from the receiver before continuing its tasks.
48
Various message types for Sequence and Collaboration diagrams
Lifelines
Lifelines are vertical dashed lines that indicate the object's presence over time.
Destroying Objects
Objects can be terminated early using an arrow labeled "<< destroy >>" that points to an X.
49
Loops
A repetition or loop within a sequence diagram is depicted as a rectangle. Place the condition for
exiting the loop at the bottom left corner in square brackets [ ].
50
login Search Profile changepasswor logout
d
Search
Loginfail
ViewProfile
Changepassword
Logout
51
State chart Diagram
A statechart diagram shows the behavior of classes in response to external stimuli. This diagram
models the dynamic flow of control from state to state within a system.
Basic Statechart Diagram Symbols and Notations
States
States represent situations during the life of an object. You can easily illustrate a state in
SmartDraw by using a rectangle with rounded corners.
Transition
A solid arrow represents the path between different states of an object. Label the transition with
the event that triggered it and the action that results from it.
Initial State
A filled circle followed by an arrow represents the object's initial state.
52
Final State
An arrow pointing to a filled circle nested inside another circle represents the object's final state.
Synchronization and Splitting of Control
A short heavy bar with two transitions entering it represents a synchronization of control. A short
heavy bar with two transitions leaving it represents a splitting of control that creates multiple
states.
53
user
registration
login
Searchingthecloudfiles
Profile
Changepassword
logout
54
Activity Diagram
An activity diagram illustrates the dynamic nature of a system by modeling the flow of control
from activity to activity. An activity represents an operation on some class in the system that
results in a change in the state of the system. Typically, activity diagrams are used to model
workflow or business processes and internal operation. Because an activity diagram is a special
kind of statechart diagram, it uses some of the same modeling conventions.
Basic Activity Diagram Symbols and Notations
Action states
Action states represent the noninterruptible actions of objects. You can draw an action state in
SmartDraw using a rectangle with rounded corners.
Action Flow
Action flow arrows illustrate the relationships among action states.
55
Object Flow
Object flow refers to the creation and modification of objects by activities. An object flow arrow
from an action to an object means that the action creates or influences the object. An object flow
arrow from an object to an action indicates that the action state uses the object.
Initial State
A filled circle followed by an arrow represents the initial action state.
Final State
An arrow pointing to a filled circle nested inside another circle represents the final action state.
Branching
A diamond represents a decision with alternate paths. The outgoing alternates should be labeled
with a condition or guard expression. You can also label one of the paths "else."
56
Synchronization
A synchronization bar helps illustrate parallel transitions. Synchronization is also called forking
and joining.
Swimlanes
Swimlanes group related activities into one column.
57
User
login
loginfail
loginsuccess
UserHome
Registrasion Profile
Searching Security
5.4 Data Flow Diagrams:

Data Flow Diagrams:
A graphical tool used to describe and analyze the moment of data through a system manual or
automated including the process, stores of data, and delays in the system. Data Flow Diagrams
58
are the central tool and the basis from which other components are developed. The
transformation of data from input to output, through processes, may be described logically and
independently of the physical components associated with the system. The DFD is also know as
a data flow graph or a bubble chart.
DFDs are the model of the proposed system. They clearly should show the requirements on
which the new system should be built. Later during design activity this is taken as the basis for
drawing the system’s structure charts. The Basic Notation used to create a DFD’s are as follows:
1. Dataflow: Data move in a specific direction from an origin to a destination.
2. Process: People, procedures, or devices that use or produce (Transform) Data. The physical
component is not identified.
3. Source: External sources or destination of data, which may be People, programs,
organizations or other entities.
4. Data Store: Here data are stored or referenced by a process in the System.
59
CONTEXT LEVEL DIAGRAM
AUTHENTICATION DFD:
60
DataOwner DFD:
ed
61
CloudServer:
62
63
1.5 Data Dictionaries and ER Diagram:
ER Diagram:
In software engineering, an entity-relationship model (ERM) is an abstract and

conceptual representation of data. Entity-relationship modeling is a database modeling method,
used to produce a type of conceptual schema or semantic data model of a system, often a
relational database, and its requirements in a top-down fashion. Diagrams created by this process
are called entity-relationship diagrams, ER diagrams, or ERDs. The definitive reference for
entity-relationship modeling is Peter Chen's 1976 paper. However, variants of the idea existed
previously, and have been devised subsequently. An entity may be defined as a thing which is
recognized as being capable of an independent existence and which can be uniquely identified.
An entity is an abstraction from the complexities of some domain. When we speak of an entity
we normally speak of some aspect of the real world which can be distinguished from other
aspects of the real world. An entity may be a physical object such as a house or a car, an event
such as a house sale or a car service, or a concept such as a customer transaction or order.
Although the term entity is the one most commonly used, following Chen we should really
distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly
speaking, is an instance of a given entity-type. There are usually many instances of an entity-
type. Because the term entity-type is somewhat cumbersome, most people tend to use the term
entity as a synonym for this term. Entities can be thought of as nouns. Examples: a computer, an
employee, a song, a mathematical theorem. A relationship captures how two or more entities are
related to one another. Relationships can be thought of as verbs, linking two or more nouns.
Examples: an owns relationship between a company and a computer, a supervises relationship
between an employee and a department, a performs relationship between an artist and a song, a
proved relationship between a mathematician and a theorem. The model's linguistic aspect
described above is utilized in the declarative database query languageERROL, which mimics
natural language constructs. Entities and relationships can both have attributes. Examples: an
employee entity might have a Social Security Number (SSN) attribute; the proved relationship
may have a date attribute. Every entity (unless it is a weak entity) must have a minimal set of
uniquely identifying attributes, which is called the entity's primary key. Entity-relationship
64
diagrams don't show single entities or single instances of relations. Rather, they show entity sets
and relationship sets. Example: a particular song is an entity. The collection of all songs in a
database is an entity set. The eaten relationship between a child and her lunch is a single
relationship. The set of all such child-lunch relationships in a database is a relationship set. In
other words, a relationship set corresponds to a relation in mathematics, while a relationship
corresponds to a member of the relation. Certain cardinality constraints on relationship sets may
be indicated as well.
ER Diagram:
Data Dictionary:
Addresses:
65
User details:
Count:
Data1
DownloadUserdetails:
66
File Data:
Logindetails:
Mainindex:
Phones:
SearchData:
67
StasaticsLeakage:
68
6. Implementation
6.1 Introduction
Implementation is the stage where the theoretical design is turned in to working system.
The most crucial stage is achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.
The system can be implemented only after through testing is done and if it found to work
according to the specification. It involves careful planning, investigation of the current system
and its constraints on implementation, design of methods to achieve the change over and an
evaluation of change over methods a part from planning. Two major tasks of preparing the
implementation are education and training of the users and testing of the system.
The more complex the system being implemented, the more involved will be the systems
analysis and design effort required just for implementation. The implementation phase comprises
of several activities. The required hardware and software acquisition is carried out. The System
may require some hardware and software acquisition is carried out. The system may require
some software to be developed. For this, programs are written and tested. The user then changes
over to his new fully tested system and the old system is discontinued.
Implementation is the process of having systems personnel check out and put new
equipment in to use, train users, install the new application, and construct any files of data
needed to it.
Depending on the size of the organization that will be involved in using the application
and the risk associated with its use, system developers may choose to test the operation in only
one area of the firm, say in one department or with only one or two persons. Sometimes they
will run the old and new systems together to compare the results. In still other situations,
developers will stop using the old system one-day and begin using the new one the next. As we
will see, each implementation strategy has its merits, depending on the business situation in
which it is considered. Regardless of the implementation strategy used, developers strive to
ensure that the system’s initial use in trouble-free.
69
Once installed, applications are often used for many years. However, both the
organization and the users will change, and the environment will be different over the weeks and
months. Therefore, the application will undoubtedly have to be maintained. Modifications and
changes will be made to the software, files, or procedures to meet the emerging requirements.
7.1 Technology Description
About the Java Technology
The Java platform consists of the Java application programming interfaces (APIs)
and the Java virtual machine (JVM).
The following Java technology lets developers, designers, and business partners develop and
deliver a consistent user experience, with one environment for applications on mobile and
embedded devices. Java meshes the power of a rich stack with the ability to deliver customized
experiences across such devices.
Java APIs are libraries of compiled code that you can use in your programs. They let you add
ready-made and customizable functionality to save you programming time.
Java programs are run (or interpreted) by another program called the Java Virtual Machine.
Rather than running directly on the native operating system, the program is interpreted by the
Java VM for the native operating system. This means that any computer system with the Java
VM installed can run Java programs regardless of the computer system on which the applications
were originally developed.
70
In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris TM Operating System (Solaris OS),
Linux, or Mac OS.
Java technology is both a programming language and a platform.
The Java Programming Language
The Java programming language is a high-level language that can be characterized by all of the
following buzzwords:
 Simple  Architecture neutral
 Object oriented  Portable
 Distributed  High performance
 Multithreaded  Robust
 Dynamic  Secure
Each of the preceding buzzwords is explained in The Java Language Environment , a white
paper written by James Gosling and Henry McGilton.
In the Java programming language, all source code is first written in plain text files ending with
the .java extension. Those source files are then compiled into .class files by the javac compiler. A
71
.class file does not contain code that is native to your processor; it instead contains bytecodes —
the machine language of the Java Virtual Machine 1 (Java VM). The java launcher tool then runs
your application with an instance of the Java Virtual Machine.
An overview of the software development process.
Because the Java VM is available on many different operating systems, the same .class files are
capable of running on Microsoft Windows, the Solaris™ Operating System (Solaris OS), Linux,
or Mac OS. Some virtual machines, such as the Java HotSpot virtual machine, perform
additional steps at runtime to give your application a performance boost. This include various
tasks such as finding performance bottlenecks and recompiling (to native code) frequently used
sections of code
72
Through the Java VM, the same application is capable of running on multiple platforms.
Servlet and JSP technology

Servlet and JSP technology has become the technology of choice for developing online stores,
interactive
A Servlet’s Job
Servlets are Java programs that run on Web or application servers, acting as a middle layer
between requests coming from Web browsers or other HTTP clients and databases or
applications on the HTTP server. Their job is to perform the following tasks,
as illustrated in Figure 1–1.
73
1. Read the explicit data sent by the client.
The end user normally enters this data in an HTML form on a Web page. However, the data
could also come from an applet or a custom HTTP client program. Chapter 4 discusses how
servlets read this data.
2. Read the implicit HTTP request data sent by the browser.
Figure 1–1 shows a single arrow going from the client to the Web server (the layer where
servlets and JSP execute), but there are really two varieties of data: the explicit data that the end
user enters in a form and the behind-the-scenes HTTP information. Both varieties are critical.
The HTTP information includes cookies, information about media types and compression
schemes the browser understands,
3. Generate the results.

This process may require talking to a database, executing an RMI or EJB call, invoking a Web
service, or computing the response directly. Your real data may be in a relational database. Fine.
But your database probably doesn’t speak HTTP or return results in HTML, so the Web browser
can’t talk directly to the database. Even if it could, for security reasons, you probably would not
want it to. The same argument applies to most other applications. You need the Web middle
layer to extract the incoming data from the HTTP stream, talk to the application, and embed the
results inside a document.
4. Send the explicit data (i.e., the document) to the client.
This document can be sent in a variety of formats, including text (HTML or XML), binary (GIF
images), or even a compressed format like gzip that is layered on top of some other underlying
format. But, HTML is by far the most common format, so an important servlet/JSPtask is to
wrap the results inside of HTML.
5. Send the implicit HTTP response data.
Figure 1–1 shows a single arrow going from the Web middle layer (the servlet or JSP page) to
the client. But, there are really two varieties of data sent: the document itself and the behind-the-
scenes HTTP information. Again, both varieties are critical to effective development. Sending
HTTP response data involves telling the browser or other client what type of document is being
returned (e.g., HTML), setting cookies and caching parameters
74
The Advantages of Servlets Over “Traditional” CGI
Java servlets are more efficient, easier to use, more powerful, more portable, safer, and cheaper
than traditional CGI and many alternative CGI-like technologies. With traditional CGI, a new
process is started for each HTTP request. If the CGIprogram itself is relatively short, the
overhead of starting the process can dominatethe execution time. With servlets, the Java virtual
machine stays running and handleseach request with a lightweight Java thread, not a
heavyweight operating system process.Similarly, in traditional CGI, if there are N requests to the
same CGI program,the code for the CGI program is loaded into memory N times. With servlets,
however,there would be N threads, but only a single copy of the servlet class would be
loaded. This approach reduces server memory requirements and saves time by instantiating
fewer objects. Finally, when a CGI program finishes handling a request, the program terminates.
This approach makes it difficult to cache computations, keep database connections open, and
perform other optimizations that rely on persistent data. Servlets, however, remain in memory
even after they complete a response, so it is straightforward to store arbitrarily complex data
between client requests.
Convenient
Servlets have an extensive infrastructure for automatically parsing and decoding HTML
form data, reading and setting HTTP headers, handling cookies, tracking sessions, and many
other such high-level utilities. In CGI, you have to do much of this yourself. Besides, if you
already know the Java programming language, why learn Perl too? You’re already convinced
that Java technology makes for more reliable and reusable code than does Visual Basic,
VBScript, or C++. Why go back to those languages for server-side programming?
Powerful
Servlets support several capabilities that are difficult or impossible to accomplish with
regular CGI. Servlets can talk directly to the Web server, whereas regular CGI programs cannot,
at least not without using a server-specific API. Communicating with the Web server makes it
easier to translate relative URLs into concrete path names, for instance. Multiple servlets can
also share data, making it easy to implement database connection pooling and similar resource-
sharing optimizations. Servlets can also maintain information from request to request,
simplifying techniques like session tracking and caching of previous computations.
75
Portable
Servlets are written in the Java programming language and follow a standard API. Servlets are
supported directly or by a plug-in on virtually every major Web server. Consequently, servlets
written for, say, Macromedia Run can run virtually unchanged on Apache Tomcat, Microsoft
Internet Information Server (with a separate plug-in), IBM Web Sphere, planet Enterprise
Server, Oracle9i AS, or Star Nine Webster. They are part of the Java 2 Platform, Enterprise
Edition (J2EE; see http://java.sun.com/j2ee/), so industry support for servlets is becoming even
more pervasive.
Inexpensive
A number of free or very inexpensive Web servers are good for development use or deployment
of low- or medium-volume Web sites. Thus, with servlets and JSP you can start with a free or
inexpensive server and migrate to more expensive servers with high-performance capabilities or
advanced administration utilities only after your project meets initial success. This is in contrast
to many of the other CGI alternatives, which require a significant initial investment for the
purchase of a proprietary package. Price and portability are somewhat connected. For example,
Marty tries to keep track of the countries of readers that send him questions by email. India was
near the top of the list, probably #2 behind the U.S. Marty also taught one of his JSP and servlet
training courses (see http://courses.coreservlets.com/) in Manila, and there was great interest in
servlet and JSP technology there. Now, why are India and the Philippines both so interested? We
surmise that the answer is twofold. First, both countries have large pools of well-educated
software developers.
Second both countries have (or had, at that time) highly unfavorable currency exchange
rates against the U.S. dollar. So, buying a special-purpose Web server from a U.S. company
consumed a large part of early project funds. But, with servlets and JSP, they could start with a
free server: Apache Tomcat (either standalone, embedded in the regular Apache Web server, or
embedded in Microsoft IIS). Once the project starts to become successful, they could move to a
server like Caucho Resin that had higher performance and easier administration but that is not
free. But none of their servlets or JSP pages have to be rewritten. If their project becomes even
larger, they might want to move to a distributed (clustered) environment. No problem: they could
move to Macromedia Run Professional, which supports distributed applications (Web farms).
Again, none of their servlets or JSP pages have to be rewritten. If the project becomes quite large
76
and complex, they might want to use Enterprise JavaBeans (EJB) to encapsulate their business
logic. So, they might switch to BEA Web Logic or Oracle9i AS. Again, none of their servlets
or JSP pages have to be rewritten. Finally, if their project becomes even bigger, they might move
it off of their Linux box and onto an IBM mainframe running IBM Web- Sphere. But once again,
none of their servlets or JSP pages have to be rewritten
Secure
One of the main sources of vulnerabilities in traditional CGI stems from the fact that the
programs are often executed by general-purpose operating system shells. So, the CGI
programmer must be careful to filter out characters such as backquotes and semicolons that are
treated specially by the shell. Implementing this precaution is harder than one might think, and
weaknesses stemming from this problem are constantly being uncovered in widely used CGI
libraries. A second source of problems is the fact that some CGI programs are processed by
languages that do not automatically check array or string bounds. For example, in C and C++ it
is perfectly legal to allocate a 100-element array and then write into the 999th “element,” which
is really some random part of program memory. So, programmers who forget to perform this
check open up their system to deliberate or accidental buffer overflow attacks. Servlets suffer
from neither of these problems. Even if a servlet executes a system call (e.g., with Runtime. Exec
or JNI) to invoke a program on the local operating system, it does not use a shell to do so. And,
of course, array bounds checking and other memory protection features are a central part of the
Java programming language.
Mainstream
There are a lot of good technologies out there. But if vendors don’t support them and developers
don’t know how to use them, what good are they? Servlet and JSP technology is supported by
servers from Apache, Oracle, IBM, Sybase, BEA, Macromedia, Caucho, Sun/planet, New
Atlanta, ATG, Fujitsu, Ultras, Silver stream, the World Wide Web Consortium (W3C), and
many others. Several low-cost plugins add support to Microsoft IIS and Zeus as well. They run
on Windows, Unix/Linux, Maces, VMS, and IBM mainframe operating systems. They are the
single most popular application of the Java programming language. They are arguably the most
popular choice for developing medium to large Web applications. They are used by the airline
77
industry (most United Airlines and Delta Airlines Web sites), e-commerce (ofoto.com), online
banking (First USA Bank, Blanco Popular de Puerto Rico), Web search engines/portals
(excite.com), large financial sites (American Century Investments), and hundreds of other sites
that you visit every day. Of course, popularity alone is no proof of good technology. Numerous
counter-examples abound. But our point is that you are not experimenting with a
new and unproven technology when you work with server-side Java.
The Role of JSP

A somewhat oversimplified view of servlets is that they are Java programs with HTML
embedded inside of them. A somewhat oversimplified view of JSP documents is that they are
HTML pages with Java code embedded inside of them. For example, compare the sample servlet
shown earlier (Listing 1.1) with the JSP page shown below (Listing 1.2). They look totally
different; the first looks mostly like a regular Java class, whereas the second looks mostly like a
normal HTML page. The interesting thing is that, despite the huge apparent difference, behind
the scenes they are the same. In fact, a JSP document is just another way of writing a servlet. JSP
pages get translated into servlets, the servlets get compiled, and it is the servlets that run at
request time. So, the question is, If JSP technology and servlet technology are essentially
equivalent in power, does it matter which you use? The answer is, Yes, yes, yes! The issue is not
power, but convenience, ease of use, and maintainability. For example, anything you can do in
the Java programming language you could do in assembly language. Does this mean that it does
not matter which you use? Hardly. JSP is discussed in great detail starting in Chapter 10. But, it
is worthwhile mentioning now how servlets and JSP fit together. JSP is focused on simplifying
the creation and maintenance of the HTML. Servlets are best at invoking the business logic and
performing complicated operations. A quick rule of thumb is that servlets are best for tasks
oriented toward processing, whereas JSP is best for tasks oriented toward presentation. For some
requests, servlets are the right choice. For other requests, JSP is a better option. For still others,
neither servlets alone nor JSP alone is best, and a combination of the two (see Chapter 15,
“Integrating Servlets and JSP: The Model View Controller (MVC) Architecture”) is best. But the
point is that you need both servlets and JSP in your overall project: almost no project will consist
entirely of servlets or entirely of JSP. You want both.
78
7. System Screens
79
80
81
82
LOGIN TO DATA OWNER
83
84
UPLOAD FILE TO THE CLOUD SERVER
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
User:-
118
119
120
121
122
123
124
125
126
127
128
8. System Testing
8.1 Testing Methodologies
Testing is the process of finding differences between the expected behavior
specified by system models and the observed behavior implemented system. From modeling
point of view , testing is the attempt of falsification of the system with respect to the system
models. The goal of testing is to design tests that exercise defects in the system and to reveal
problems.
The process of executing a program with intent of finding errors is called testing. During testing ,
the program to be tested is executed with a set of test cases , and the output of the program for
the test cases is evaluated to determine if the program is performing as expected . Testing forms
the first step in determining the errors in the program. The success of testing in revealing errors
in program depends critically on test cases.
Strategic Approach to Software Testing:

The software engineering process can be viewed as a spiral. Initially system engineering defines
the role of software and leads to software requirements analysis where the information domain ,
functions , behavior , performance , constraints and validation criteria for software are
established. moving inward along the spiral , we come to design and finally to coding . To
develop computer software we spiral in along streamlines that decreases the level of abstraction
on each item.
A Strategy for software testing may also be viewed in the context of the spiral. Unit testing
begins at the vertex of the spiral and concentrates on each unit of the software as implemented in
source code. Testing will progress by moving outward along the spiral to integration testing ,
where the focus on the design and the concentration of the software architecture. Talking another
turn on outward on the spiral we encounter validation testing where requirements established as
part of software requirements analysis are validated against the software that has been
constructed . Finally we arrive at system testing , where the software and other system elements
are tested as a whole .
129
UNIT TESTING
UNUNI
MODULE
Component SUB-SYSTEM
SYSTEM TESTING
Integration Testing
ACCEPTANCE
User Testing
Different Levels of Testing
Client Needs Acceptance Testing

Requirements System Testing
Design Integration Testing
Code Unit Testing
Testing is the process of finding difference between the expected behavior specified by system
models and the observed behavior of the implemented system.
8.2 Testing Activities

Different levels of testing are used in the testing process , each level of testing aims to test
different aspects of the system. the basic levels are:
130
Unit testing
Integration testing
System testing
Acceptance testing
Unit Testing
Unit testing focuses on the building blocks of the software system, that is, objects and sub system
. There are three motivations behind focusing on components. First, unit testing reduces the
complexity of the overall tests activities, allowing us to focus on smaller units of the system.
Second , unit testing makes it easier to pinpoint and correct faults given that few components are
involved in this test . Third , Unit testing allows parallelism in the testing activities , that is each
component can be tested independently of one another . Hence the goal is to test the internal
logic of the module.
Integration Testing
In the integration testing, many test modules are combined into sub systems , which are then
tested . The goal here is to see if the modules can be integrated properly, the emphasis being on
testing module interaction.
After structural testing and functional testing we get error free modules. These modules are to be
integrated to get the required results of the system. After checking a module, another module is
tested and is integrated with the previous module. After the integration, the test cases are
generated and the results are tested.
System Testing
In system testing the entire software is tested . The reference document for this process is the
requirement document and the goal is to see whether the software meets its requirements. The
system was tested for various test cases with various inputs.
Acceptance Testing
Acceptance testing is sometimes performed with realistic data of the client to demonstrate that
131
the software is working satisfactory. Testing here focus on the external behavior of the system ,
the internal logic of the program is not emphasized . In acceptance testing the system is tested for
various inputs.
8.3 Types of Testing
1. Black box or functional testing
2. White box testing or structural testing
Black box testing

This method is used when knowledge of the specified function that a product has been designed
to perform is known . The concept of black box is used to represent a system whose inside
workings are not available to inspection . In a black box the test item is a "Black" , since its logic
is unknown , all that is known is what goes in and what comes out , or the input and output.
Black box testing attempts to find errors in the following categories:
Incorrect or missing functions
Interface errors
Errors in data structure
Performance errors
Initialization and termination errors
As shown in the following figure of Black box testing , we are not thinking of the internal
workings , just we think about
What is the output to our system?
What is the output for given input to our system?
Input
? Output
132
The Black box is an imaginary box that hides its internal workings
White box testing

White box testing is concerned with testing the implementation of the program. the intent of
structural is not to exercise all the inputs or outputs but to exercise the different programming
and data structure used in the program. Thus structural testing aims to achieve test cases that will
force the desire coverage of different structures . Two types of path testing are statement testing
coverage and branch testing coverage.
INTERNAL
Input Output
WORKING
The White Box testing strategy , the internal workings
8.4 Test Plan

Testing process starts with a test plan. This plan identifies all the testing related activities that
must be performed and specifies the schedules , allocates the resources , and specified guidelines
for testing . During the testing of the unit the specified test cases are executed and the actual
result compared with expected output. The final output of the testing phase is the test report and
the error report.
Test Data:
Here all test cases that are used for the system testing are specified. The goal is to test the
different functional requirements specified in Software Requirements Specifications (SRS)
document.
133
Unit Testing:
Each individual module has been tested against the requirement with some test data.
Test Report:
The module is working properly provided the user has to enter information. All data entry forms
have tested with specified test cases and all data entry forms are working properly.
Error Report:
If the user does not enter data in specified order then the user will be prompted with error
messages. Error handling was done to handle the expected and unexpected errors.
8.5 Test cases
A Test case is a set of input data and expected results that exercises a component
with the purpose of causing failure and detecting faults .test case is an explicit set of instructions
designed to detect a particular class of defect in a software system , by bringing about a failure .
A Test case can give rise to many tests.
134
Test Case 1:
Login Page Test Case
Test Steps
Test Test Case

Case Description
Name Step Expected Actual
Login Validate Login To verify that enter login name an error

Login name less than 1 chars message “Login
on login page (say a) and not less than 1
must be password and characters”
greater than 1 click Submit must be
characters button displayed
enter login name Login success

1 chars (say a) full or an error
and password and message
click Submit “Invalid Login
button or Password”
must be
displayed
Pwd Validate To verify that enter Password an error

Password Password on less than 1 chars message
login page (say nothing) and “Password not
must be Login Name and less than 1
greater than 1 click Submit characters”
characters button must be
displayed
135
Pwd02 Validate enter Password Login success
To verify that
Password with special full or an error
Password on
characters(say ! message
login page
@hi&*P) Login “Invalid Login
must be allow
Name and click or Password”
special
Submit button must be
characters
displayed
Llnk Verify To Verify the Click Sign Up Home Page

Hyperlinks Hyper Links Link must be
available at displayed
left side on
login page
working or
Click Sign Up Sign Up page
not
Link must be
displayed
Click New Users New Users

Link Registration
Form must be
displayed
136
Registration Page Test Case
Test Steps
Test Case Test Case

Name Description
Step Expected Actual
Registration Validate User To verify that enter User name an error

Name User name on click Submit message User
Registration button Name Must be
page must be Declared
Declared
Validate To verify that enter Password an error

Password Password on click Submit message
Registration button Password
page must be Must be
Declared Declared
Validate First To verify that enter First Name an error

Name First Name on click Submit message First
Declared
Validate Last To verify that enter Last Name an error

Name Last Name on click Submit message Last
Declared
Validate To verify that enter Address an error

137
Address Address on click Submit message
Registration button Address Must
page must be be Declared
Declared
Validate To verify that enter Phone an error

Phone Phone number number click message
number on Submit button Phone number
Registration Must be
Declared

Phone Phone number number is only message
number is (say abc) numeric values Phone number
giving Registration click Submit Must be
characters page must be button numeric
Declared Declared

Phone Phone number number is Valid message
number valid (say 1234) values click Phone number
number Registration Submit button Must be Valid
page must be value Declared
Declared
138
9. Conclusion and Future Enhancements
9.1 Conclusion:
we motivate and solve the problem of securemultikeyword top-k retrieval over encrypted cloud
data.We define similarity relevance and scheme robustness.Based on OPE invisibly leaking
sensitive information, wedevise a server-side ranking SSE scheme. whichfulfills the security
requirements ofmultikeyword top-k retrieval over the encrypted clouddata. By security analysis,
we show that the proposedscheme guarantees data privacy
9.2 Scope for Future Enhancements
We also investigatesome further enhancements of our top k retrieval searchmechanism, including

the efficient support of relevancescore dynamics, the authentication of ranked search results,and
the reversibility of our proposed trse mapping technique. Through thoroughsecurity analysis, we
show that our proposed solution issecure and privacy-preserving, while correctly realizingthe
goal of multi keyword search. Extensive experimentalresults demonstrate the efficiency of our
solution.
139
REFERENCES:
[1] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A.Konwinski, G. Lee, D. Patterson, A.
Rabkin, and M. Zaharia, “AView of Cloud Computing,” Comm. ACM, vol. 53, no. 4, pp. 50-
[2]M.Arrington,“GmailDisaster:ReportsofMassEmailDeletions,”http://www.techcrunch.com/
2006/12/28/gmail-disasterreportsof-mass-email-deletions/, Dec. 2006.
[3] Amazon.com, “Amazon s3 AvailabilityEvent:July20,2008,”http://status.aws.amazon.com/s3-
20080720.html, 2008.
[4] RAWA News, “Massive Information Leak Shakes Washingtonover Afghan
War,”http://www.rawa.org/temp/runews/2010/
08/20/massive-information-leak-shakes-washington-overafghan-war.html, 2010.
[5] AHN, “Romney Hits Obama for Security Information
Leakage,”http://gantdaily.com/2012/07/25/romney-hits-obama-forsecurity-information-leakage/,
2012.
[6] Cloud Security Alliance, “Top Threats to Cloud Computing,”http://www.cloudsecurity
alliance.org, 2010.
[7] C. Leslie, “NSA Has Massive Database of Americans’
PhoneCalls,”http://usatoday30.usatoday.com/news/washington/2006-05-10/, 2013.
[8] R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky, “SearchableSymmetric Encryption:
Improved Definitions and Efficient Constructions,”Proc. ACM 13th Conf. Computer and Comm.
Security(CCS), 2006.
[9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword Search over
Encrypted Cloud Data,” Proc. IEEE 30th Int’lConf. Distributed Computing Systems (ICDCS),
2010.
[10] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-kRetrieval from a
Confidential Index,” Proc. 12th Int’l Conf.Extending Database Technology: Advances in
Database Technology(EDBT), 2009.
[11] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully
Homomorphic Encryption over the Integers,” Proc. 29th Ann. Int’l
Conf. Theory and Applications of Cryptographic Techniques, H. Gilbert,
pp. 24-43, 2010.
140
[12] M. Perc, “Evolution of the Most Common English Words and
Phrases over the Centuries,” J. Royal Soc. Interface, 2012.
[13] O. Regev, “New Lattice-Based Cryptographic Constructions,”
J. ACM, vol. 51, no. 6, pp. 899-942, 2004.
[14] N. Howgrave-Graham, “Approximate Integer Common Divisors,”
Proc. Revised Papers from Int’l Conf. Cryptography and Lattices
(CaLC’ 01), pp. 51-66, 2001.
[15] "NSF Research Awards Abstracts 1990-2003," http://kdd.ics.uci.
edu/databases/nsfabs/nsfawards.html, 2013.
141

Towards Secure Multi-Keyword Top-K Retrieval Documentation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Towards Secure Multi-Keyword Top-K Retrieval Documentation

Uploaded by

Copyright:

Available Formats

Towards Secure Multi-Keyword Top-k Retrieval

Over Encrypted Cloud Data

1.3.3 Model Diagrams 8

3.1 Existing System 24

3.2 Problem statement 24

3.4 Feasibility Study 27

3.4.1 Economic Feasibility 27

3.4.2 Operational Feasibility 27

3.4.3 Technical Feasibility 28

4. System Requirements Specification 31

4.3 Functional Requirements 32

4.4 Non Functional Requirements 34

4.5 Hardware Requirements 34

4.6 Software Requirements 34

5.2 System Model 36

5.4 DFD Diagrams 58

8.1 Testing Methodologies 129

8.2 Testing activities 130

8.3 Types of Testing 132

8.4 Test Plan 133

8.5 Test cases 134

9. Conclusion and Future Enhancements 139

9.1 Conclusion 139

9.2 Scope for Future Enhancement 139

10. References 140

Section3. Inthis section we will discussion,Fundamental Concepts on Data Mining, Existing

Section5. In this section we will discussion System Requirements Specification.

Section7. In this section we will discussion Implementation and Technology Description.

Section8. In this section we will discussionSystem Testing and Result Analysis.

Section9. In this section we will discussionConclusion and Future Enhancement.

Cipher text compression.

SOFTWARE MODEL OR ARCHITECTURE ANALYSIS:

Structured project management techniques (such as an SDLC) enhance

The steps for Spiral Model can be generalized as follows:

 A preliminary design is created for the new system.

 A second prototype is evolved by a fourfold procedure:

2. Defining the requirements of the second prototype.

3. Planning an designing the second prototype.

4. Constructing and testing the second prototype.

 The final system is constructed, based on the refined prototype.

 High amount of risk analysis

 Software is produced early in the software life cycle.

3.1.2 Problem statement

1. It can retrieve the results with less communication overhead.

1. Entities in the cloud computing system:

2. RelevanceScore for multi-keyword

3. Spastically leakage and Vector specification Model

2 RelevanceScore for multi-keyword:

3 Spastically leakage and Vector specification Model:

4.5.2 OPERATIONAL FEASIBILITY

 Is there sufficient support for the management from the users?

4.5.3 TECHNICAL FEASIBILITY

 Does the necessary technology exist to do what is suggested?

A collection of requirements define the characteristics or features of the desired system. A

5.3 Functional Requirements:

1.Dataowner First Uploading Files into cloud server

2. View the indexwords in the dataowner it means index of the files

3. view all the Keyword based index or identifiy the weight.

4. we have to provide the security to system by use of changepassword

5. View all users in the project

6. close the module or logout module

1. what are all files exist in the cloudserver environment

2. what are the all containg files outsourced files .