3 DocumentFINAl18nov PDF

1
CHAPTER 1
INTRODUCTION
Cloud computing is the delivery of computing as a service rather than a product,

whereby shared resources, software, and information are provided to computers and
other devices as a metered service over a network (typically the Internet).Cloud
computing provides computation, software, data access, and storage resources without
requiring cloud users to know the location and other details of the computing
infrastructure.End users access cloud based applications through a web browser or a
light weight desktop or mobile app while the business software and data are stored on
servers at a remote location. Cloud application providers strive to give the same or
better service and performance as if the software programs were installed locally on
end-user computers.
Figure1.1. Cloud Computing

2
1.1 BACKGROUND CONCEPTS
1.1.1 Cloud Computing

Cloud computing exhibits the following key characteristics:
Empowerment of end-users of computing resources by putting the provisioning

of those resources in their own control, as opposed to the control of a centralized
IT service (for example)
Agility improves with users' ability to re-provision technological infrastructure

resources.
Application programming interface (API) accessibility to software that enables

machines to interact with cloud software in the same way the user interface
facilitates interaction between humans and computers. Cloud computing systems
typically use REST-based APIs.
Cost is claimed to be reduced and in a public cloud delivery model capital

expenditure is converted to operational expenditure. This is purported to lower
barriers to entry, as infrastructure is typically provided by a third-party and does
not need to be purchased for one-time or infrequent intensive computing tasks.
Pricing on a utility computing basis is fine-grained with usage-based options and
fewer IT skills are required for implementation (in-house).
Device and location independence enable users to access systems using a web
browser regardless of their location or what device they are using (e.g., PC,
mobile phone). As infrastructure is off-site (typically provided by a third-party)
and accessed via the Internet, users can connect from anywhere.
Virtualization technology allows servers and storage devices to be shared and

utilization be increased. Applications can be easily migrated from one physical
server to another.
3
Multi-tenancy enables sharing of resources and costs across a large pool of users
thus allowing for:
o Centralization of infrastructure in locations with lower costs (such as real

estate, electricity, etc.)
o Peak-load capacity increases (users need not engineer for highest

possible load-levels)
o Utilisation and efficiency improvements for systems that are often only
1020% utilised.
Reliability is improved if multiple redundant sites are used, which makes well-
designed cloud computing suitable for business continuity and disaster recovery.
Scalability and Elasticity via dynamic ("on-demand") provisioning of resources

on a fine-grained, self-service basis near real-time, without users having to
engineer for peak loads.
Performance is monitored and consistent and loosely coupled architectures are

constructed using web services as the system interface.
Security could improve due to centralization of data, increased security-focused

resources, etc., but concerns can persist about loss of control over certain
sensitive data, and the lack of security for stored kernels. Security is often as
good as or better than other traditional systems, in part because providers are
able to devote resources to solving security issues that many customers cannot
afford. However, the complexity of security is greatly increased when data is
distributed over a wider area or greater number of devices and in multi-tenant
systems that are being shared by unrelated users. In addition, user access to
security audit logs may be difficult or impossible. Private cloud installations are
in part motivated by users' desire to retain control over the infrastructure and
avoid losing control of information security.
4
Maintenance of cloud computing applications is easier, because they do not need

to be installed on each user's computer and can be accessed from different
places.
1.1.1.1 Service Models
Cloud computing providers offer their services according to three fundamental

models: Infrastructure as a service (IaaS), platform as a service (PaaS), and software as
a service (SaaS) where IaaS is the most basic and each higher model abstracts from the
details of the lower models.
Figure 1.2 Service Models
1.1.1.2 Cloud clients
Users access cloud computing using networked client devices, such as desktop
computers, laptops, tablets and smart phones. Some of these devices - cloud clients -
rely on cloud computing for all or a majority of their applications so as to be essentially
useless without it. Examples are thin clients and the browser-based Chrome book. Many
cloud applications do not require specific software on the client and instead use a web
browser to interact with the cloud application. With AJAX and HTML5 these Web user
interfaces can achieve a similar or even better look and feel as native applications. Some
cloud applications, however, support specific client software dedicated to these
applications (e.g., virtual desktop clients and most email clients). Some legacy
5
applications (line of business applications that until now have been prevalent in thin
client Windows computing) are delivered via a screen-sharing technology.
1.1.1.3 Deployment models
Figure 1.3 Deployment Models
Cloud computing types
Public cloud
A public cloud is one based on the standard cloud computing model, in which a
service provider makes resources, such as applications and storage, available to the
general public over the Internet. Public cloud services may be free or offered on a pay-
per-usage model.
Community cloud
Community cloud shares infrastructure between several organizations from a

specific community with common concerns (security, compliance, jurisdiction, etc.),
whether managed internally or by a third-party and hosted internally or externally. The
costs are spread over fewer users than a public cloud (but more than a private cloud), so
only some of the cost savings potential of cloud computing are realized.
6
Hybrid cloud
Hybrid cloud is a composition of two or more clouds (private, community, or

public) that remain unique entities but are bound together, offering the benefits of
multiple deployment models. It can also be defined as multiple cloud systems that are
connected in a way that allows programs and data to be moved easily from one
deployment system to another.
Private cloud
Private cloud is infrastructure operated solely for a single organization, whether

managed internally or by a third-party and hosted internally or externally. They have
attracted criticism because users "still have to buy, build, and manage them" and thus do
not benefit from less hands-on management, essentially "[lacking] the economic model
that makes cloud computing such an intriguing concept".
1.1.1.4 Architecture
Figure1.4 Cloud Computing - Sample Architecture
1.1.2 Distributed Computing
A computer program that runs in a distributed system is called a distributed

program, and distributed programming is the process of writing such programs. There
are many alternatives for the message passing mechanism, including RPC-like
7
connectors and message queues. An important goal and challenge of distributed systems
is location transparency.
Distributed computing also refers to the use of distributed systems to solve
computational problems. In distributed computing, a problem is divided into many
tasks, each of which is solved by one or more computers, which communicate with each
other by message passing.
The word distributed in terms such as "distributed system", "distributed
programming", and "distributed algorithm" originally referred to computer networks
where individual computers were physically distributed within some geographical area.
The terms are nowadays used in a much wider sense, even referring to autonomous
processes that run on the same physical computer and interact with each other by
message passing. While there is no single definition of a distributed system, the
following defining properties are commonly used:
There are several autonomous computational entities, each of which has its own
local memory.
The entities communicate with each other by message passing.
In this article, the computational entities are called computers or nodes.

A distributed system may have a common goal, such as solving a large
computational problem. Alternatively, each computer may have its own user with
individual needs, and the purpose of the distributed system is to coordinate the use of
shared resources or provide communication services to the users.
Architecture
Client/Server System : The Client-server architecture is a way to provide a

service from a central source. There is a single server that provides a service, and many
clients that communicate with the server to consume its products. In this architecture,
clients and servers have different jobs. The server's job is to respond to service requests
from clients, while a client's job is to use the data provided in response in order to
perform some tasks.
Peer-to-Peer System : The term peer-to-peer is used to describe distributed
systems in which labour is divided among all the components of the system. All the
8
computers send and receive data, and they all contribute some processing power and
memory. As a distributed system increases in size, its capacity of computational
resources increases. In a peer-to-peer system, all components of the system contribute
some processing power and memory to a distributed computation.
Fig.1.5 a b) Distributed computing, c) Parallel Computing
1.1.3 Parallel and distributed computing
Distributed systems are groups of networked computers, which have the same
goal for their work. The terms "concurrent computing", "parallel computing", and
"distributed computing" have a lot of overlap, and no clear distinction exists between
them. The same system may be characterized both as "parallel" and "distributed"; the
processors in a typical distributed system run concurrently in parallel. Parallel
computing may be seen as a particular tightly coupled form of distributed computing,
and distributed computing may be seen as a loosely coupled form of parallel computing.
Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or
"distributed" using the following criteria:
In parallel computing, all processors may have access to a shared memory to

exchange information between processors.
9
In distributed computing, each processor has its own private memory

(distributed memory). Information is exchanged by passing messages between
the processors.
The figure 1 on the right illustrates the difference between distributed and
parallel systems. Figure 1 (a) is a schematic view of a typical distributed system; as
usual, the system is represented as a network topology in which each node is a computer
and each line connecting the nodes is a communication link. Figure 1 (b) shows the
same distributed system in more detail: each computer has its own local memory, and
information can be exchanged only by passing messages from one node to another by
using the available communication links. Figure 1 (c) shows a parallel system in which
each processor has a direct access to a shared memory.
The situation is further complicated by the traditional uses of the terms parallel
and distributed algorithm that do not quite match the above definitions of parallel and
distributed systems; see the section Theoretical foundations below for more detailed
discussion. Nevertheless, as a rule of thumb, high-performance parallel computation in
a shared-memory multiprocessor uses parallel algorithms while the coordination of a
large-scale distributed system uses distributed algorithms.
Other typical properties of distributed systems include the following:
The system has to tolerate failures in individual computers.
The structure of the system (network topology, network latency, number of

computers) is not known in advance, the system may consist of different kinds
of computers and network links, and the system may change during the
execution of a distributed program.
Each computer has only a limited, incomplete view of the system. Each
computer may know only one part of the input.
Today, many technologists believe that Web services are the proper mechanism
for integrating with disparate database environments. Contrary to public opinion, Web
services and distributed (aka XA) transactions are complementary, not alternative,
technologies. In computing, CloudTran, a transaction management product, enables
applications running in distributed computing and cloud computing architectures to
10
embed logical business transactions that adhere to the properties of ACID transactions.
Specifically, CloudTran coordinates ACID transactionality for data stored within in-
memory data grids (e.g., Oracle Coherence, GigaSpaces, and Gemfire), as well as from
the data grid to persistent storage systems (e.g., Oracle, MySQL, Microsoft SQL Server,
MongoDB).
Distributed computing has traditionally relied on a technology called distributed
transactions which is an algorithm used to coordinate the storing of a logically-related
set of data within more than one database or computer. CloudTran is aimed at
addressing issues with this approach to improve on performance, scalability, and ease of
implementation for application developers. In doing so, CloudTran enables a broad
range of developers to implement highly scalable applications that run in cloud
computing environments and distributed architectures. In addition, CloudTran is a
manifestation of Cloud Transaction Processing, or, CloudTP.
Methodology
The basic principles of all transaction-processing systems are the same. However, the
terminology may vary from one transaction-processing system to another, and the terms
used below are not necessarily universal.
Rollback
Transaction-processing systems ensure database integrity by recording intermediate
states of the database as it is modified, then using these records to restore the database
to a known state if a transaction cannot be committed. For example, copies of
information on the database prior to its modification by a transaction are set aside by
the system before the transaction can make any modifications (this is sometimes called
a before image). If any part of the transaction fails before it is committed, these copies
are used to restore the database to the state it was in before the transaction began.
Rollforward
It is also possible to keep a separate journal of all modifications to a database
(sometimes called after images). This is not required for rollback of failed transactions
but it is useful for updating the database in the event of a database failure, so some
transaction-processing systems provide it. If the database fails entirely, it must be
restored from the most recent back-up. The back-up will not reflect transactions
committed since the back-up was made. However, once the database is restored, the
journal of after images can be applied to the database (rollforward) to bring the
11
database up to date. Any transactions in progress at the time of the failure can then be
rolled back. The result is a database in a consistent, known state that includes the results
of all transactions committed up to the moment of failure.
Deadlocks
In some cases, two transactions may, in the course of their processing, attempt to access
the same portion of a database at the same time, in a way that prevents them from
proceeding. For example, transaction A may access portion X of the database, and
transaction B may access portion Y of the database. If, at that point, transaction A then
tries to access portion Y of the database while transaction B tries to access portion X, a
deadlock occurs, and neither transaction can move forward. Transaction-processing
systems are designed to detect these deadlocks when they occur. Typically both
transactions will be cancelled and rolled back, and then they will be started again in a
different order, automatically, so that the deadlock doesn't occur again. Or sometimes,
just one of the deadlocked transactions will be cancelled, rolled back, and automatically
restarted after a short delay.
Deadlocks can also occur between three or more transactions. The more transactions
involved, the more difficult they are to detect, to the point that transaction processing
systems find there is a practical limit to the deadlocks they can detect.
Compensating transaction
In systems where commit and rollback mechanisms are not available or undesirable, a
compensating transaction is often used to undo failed transactions and restore the
system to a previous state.
1.1.3.1 ACID criteria
Jim Gray defined properties of a reliable transaction system in the late 1970s under the
acronym ACID atomicity, consistency, isolation, and durability.
Atomicity
A transactions changes to the state are atomic: either all happen or none happen. These
changes include database changes, messages, and actions on transducers.
Consistency
A transaction is a correct transformation of the state. The actions taken as a group do
not violate any of the integrity constraints associated with the state.
12
Isolation
Even though transactions execute concurrently, it appears to each transaction T, that
others executed either before T or after T, but not both.
Durability
Once a transaction completes successfully (commits), its changes to the state survive
failures.
Benefits
Transaction processing has these benefits:
It allows sharing of computer resources among many users
It shifts the time of job processing to when the computing resources are less
busy
It avoids idling the computing resources without minute-by-minute human

interaction and supervision
It is used on expensive classes of computers to help amortize the cost by keeping

high rates of utilization of those expensive resources
13
1.2 PROBLEM STATEMENT
The existing system is unable to tolerate the Byzantine Failure and also it will become
unfriendly when it goes offline .To overcome the Byzantine failure we are going to
replace the Two Phase Commit Protocol by Byzantine Fault Tolerance Two Phase
Commit Protocol.
1.3 OBJECTIVE OF THE PROJECT
The main objective of our work is to build an secure protocol which is used to assure
the secured transactions. The proposed method aims to achieve the better transaction
process with the assurance of data consistency maintenance and as well as the secured
transactions.
1.4 SCOPE OF THE PROJECT
To overcome the Byzantine Failure and to attain the Atomicity,Consistency,Isolation

and Durability in cloud database transactions this project will be helpful.It also provides
security in the transaction.
14
1.5 LITERATURE SURVEY
In [1] Marian K. Iskander, Tucker Trainor, Dave W. Wilkinson proposed a

Distributed transactional database systems deployed over cloud servers, entities
cooperate to form proofs of authorizations that are justified by collections of certified
credentials. These proofs and credentials may be evaluated and collected over extended
time periods under the risk of having the underlying authorization policies or the user
credentials being in inconsistent states. It therefore becomes possible for policy-based
authorization systems to make unsafe decisions that might threaten sensitive resources.
In this paper, we highlight the criticality of the problem. We then define the notion of
trusted transactions when dealing with proofs of authorization. Accordingly, we propose
several increasingly stringent levels of policy consistency constraints, and present
different enforcement approaches to guarantee the trustworthiness of transactions
executing on cloud servers. We propose a Two-Phase Validation Commit protocol as a
solution, which is a modified version of the basic Two-Phase Validation Commit
protocols. We finally analyze the different approaches presented using both analytical
evaluation of the overheads and simulations to guide the decision makers to which
approach to use.
In [2] W. K. Wong et al Proposed an Association rule mining aims at the

discovery of item sets that co-occur frequently in transactional data. Centralized mining
has been well studied in the past. The problem has a large worst-case complexity, a fact
that motivates business to outsource the mining process to service providers, who have
developed efficient, specialized solutions. The data owner, apart from the mining cost
relief, has additional motives for outsourcing. First, it requires minimal computational
resources, since the owner is only required to produce and to send the transactions to the
miner. This makes the outsourcing model also attractive to applications in which data
owners produce transactions as streams and they have limited resources to maintain
them. Second, assume that the owner has multiple production sources of transactions,
e.g., consider a chain of supermarkets which generate transactions at different locations.
All transactions can be sent to a single provider for mining association rules. The
provider could compute association rules that are local to the individual stores or global
rules for the whole organization. Therefore, the cost of transferring transactions among
15
the sources and performing the global mining in a distributed manner is saved. On the
other hand, the service provider becomes a single point of security attack. If the service
provider is not trusted, he should be prevented from accessing the actual data because
the data may be associated with private information. In addition, even if the items (e.g.,
store products) are public, the computed association rules are property of the owner and
they are meant to be known only to him. Therefore, protecting both raw data and the
resulting association rules from the service provider is a key issue in outsourcing of data
mining.There are two approaches that can protect sensitive information. The first is to
apply an encryption function that transforms the original data to a new format. The
second is to apply data perturbation, which modifies the original raw data randomly.
The perturbation approach is less attractive since it can only provide approximate
results; on the other hand, the use of encryption allows the exact rules to be recovered.
In [3] Ling Qiu et al proposed a functional division may have to delegate or

outsource its data mining tasks to the IT division due to the lack of IT expertise and
powerful computing infrastructure which are usually centrally managed by the IT
division. This scenario can be extended to a more general circumstance in which all
divisions are individually independent organizations (or companies). This is because in
todays fast-paced business environment, it is impossible for any single organization to
understand, develop, and implement every IT needed. By outsourcing, an organization
can obtain specific human resources (e.g., skilled programming personnel) and
technological resources (e.g., more powerful computing infrastructure) for its needs of
IT services (e.g., data analysis) with lower costs. It can also be extended to online
scenarios, e.g., a distributed computing environment comprising of a center server and
some edge servers.The practice of outsourcing data mining tasks involves extensive
collaboration (e.g., exchange or share of data) across different organizations. Either the
raw data or the revealed information after analysis contains the business intelligence
(BI) and customer privacy of an organization. There is a security concern of potential
risk of exposing private information in outsourcing activities. Without proper security
policy and technology, these privacies could be very vulnerable to security breaches.
Therefore, to protect BI and customer privacy, it is urgent and critical to provide
solutions from the perspectives of both legality or regulation and technology. In this
16
research we focus on the technology-based solutions. When outsourcing mining tasks

we should protect the following three elements which may expose BI and customer
privacy: (1) the source data which contain all transactions and items; (2) the mining
requests which are item sets of interests; and (3) the mining results which are frequent
item sets and association rules.
In [4] Chris Clifton proposed a Privacy, they say keep information about me
from being available to others. This doesn't match the dictionary definition (Webster's),
.freedom from unauthorized intrusion. It is this intrusion, or use of personal data in a
way that negatively impacts someone's life, that causes concern. As long as data is not
misused, most people do not feel their privacy has been violated. The problem is that
once information is released, it may be impossible to prevent misuse. Utilizing this
distinction ensuring that a data mining project won't enable misuse of personal
information .Opens opportunities that complete privacy would prevent. To do this, we
need technical and social solutions that ensure data will not be released. The same basic
concerns also apply to collections of data. Given a collection of data, it is possible to
learn things that are not revealed by any individual data item. An individual may not be
care about someone knowing their birth date, mother's maiden name, or social security
number; but knowing all of them enables identity theft. This type of privacy problem
arises with large, multi-individual collections as well. A technique that guarantees no
individual data is revealed may still release information describing the collection as a
whole. Such corporate information is generally the goal of data mining, but some results
may still lead to concerns (often termed a secrecy, rather than privacy, issue.) The
difference between such corporate privacy issues and individual privacy is not that
significant. If we view disclosure of knowledge about an entity (information about an
individual) as a potential individual privacy violation, then generalizing this to
disclosure of information about a subset of the data captures both views.Typically used
in census data, is to aggregate items. Knowing the average income for a neighborhood is
not enough to determine the actual income of a resident of that neighborhood.
In [5] Ian Molloy The problem of outsourcing data mining tasks to a third party
service provider has been studied in a number of recent papers. While outsourcing data
mining has the potential of reducing the computation and software cost for the data
17
owners, it is important that private information about the data is not disclosed to the
service providers. The raw data and the mining results can both contain business
intelligence of the organization and private information about customers of the
organization and require protection from the service provider. Unfortunately, the current
understanding of the potential privacy threats to outsourcing data mining and the needed
privacy protection are still quite primitive. In Wong et al. proposed an approach for
outsourcing association rule mining. In their model, the data owner first encodes the
transactional database before sending it to the service provider. The service provider
finds the frequent item-sets and their support counts in the encoded database and then
sends the information back to the data owner. The data owner then decodes the results to
get the correct support counts of frequent item-sets in the original database. One nave
encoding approach is to replace each item in the original data with a randomly generated
pseudo-identifier, but this is subject to frequency analysis attack.Perfect secrecy is
achievable but prohibitively expensive. As an alternative we introduce more natural and
practical notions of security aimed at preventing frequency analysis attacks, and give a
more secure encoding. There exists a tradeoff between security and efficiency, and
when the security cost reaches a certain point it is cheaper to perform the association
rule mining oneself. Hence, we analyze how encoding impacts the costs associated with
outsourcing.
In [6] Shariq J. Rizvi et al proposed the knowledge models produced through

data mining techniques are only as good as the accuracy of their input data. One source
of data inaccuracy is when users deliberately provide wrong information. This is
especially common with regard to customers who are asked to provide personal
information on Web forms to e-commerce service providers. The compulsion for doing
so may be the (perhaps well-founded) worry that the requested information may be
misused by the service provider to harass the customer. As a case in point, consider a
pharmaceutical company that asks clients to disclose the diseases they have suffered
from in order to investigate the correlations in their occurrences.Adult females with
malarial infections are also prone to contract tuberculosis". While the company may be
acquiring the data solely for genuine data mining purposes that would eventually react
it-self in better service to the client, at the same time the client might worry that if her
medical records are either inadvertently or deliberately disclosed, it may adversely
18
affect her employment opportunities. We investigate, in this paper, whether customers

can be encouraged to provide correct information by ensuring that the mining process
cannot, with any reasonable degree of certainty, violate their privacy. At the same time,
we would like the mining process to be as accurate as possible in terms of its results.
The difficulty lies in the fact that these two metrics: privacy and accuracy are typically
contradictory in nature, with the consequence that improving one usually incurs a cost in
the other. Therefore, we comprise on the ideal and perhaps infeasible goal of having
both complete privacy and complete accuracy through approximate solutions that
provide practically acceptable values for these metrics. Note further that since the
purpose of data mining is essentially to identify statistical trends, cent-per-cent accuracy
in the mining results is perhaps often not even a required feature.
In [7] Murat Kantarcoglu proposed Data mining technology has emerged as a

means of identifying patterns and trends from large quantities of data. Data mining and
data warehousing go hand-in-hand: most tools operate by gathering all data into a
central site, then running an algorithm against that data. However, privacy concerns can
prevent building a centralized warehouse data may be distributed among several
custodians, none of which are allowed to transfer their data to another site.Computing
association rules within such a scenario. We assume homogeneous databases: All sites
have the same schema, but each site has information on different entities. The goal is to
produce association rules that hold globally, while limiting the information shared about
each site. Computing association rules without disclosing individual transactions is
straightforward. We can compute the global support and confidence of an association
rule AB) C knowing only the local supports of AB and ABC, and the size of each
database.
Protects individual data privacy, but it does require that each site disclose what
rules it supports, and how much it supports each potential global rule. What if this
information is sensitive? For example, suppose the Centers for Disease Control CDC, a
public agency, would like to mine health records to try to find ways to reduce the
proliferation of antibiotic resistant bacteria. Insurance companies have data on patient
diseases and prescriptions. The problem is that insurance companies will be concerned
about sharing this data. Not only must the privacy of patient records be maintained, but
insurers will be unwilling to release rules pertaining only to them. Imagine a rule
19
indicating a high rate of complications with a particular medical procedure. If this rule
doesnt hold globally, the insurer would like to know this they can then try to pinpoint
the problem with their policies and improve patient care. If the fact that the insurers
data supports this rule is revealed (say, under a Freedom of Information Act request to
the CDC), the insurer could be exposed to significant public relations or liability
problems. This potential risk could exceed their own perception of the benefit of
participating in the CDC.
In [8] Bobi Gilburd et al proposed Frequently prohibited by legal obligations or

commercial concerns. Such restrictions usually do not apply to cumulative statistics of
the data. Thus, the data owners usually do not object to having a trusted third party
(such as a federal agency) collect and publish these cumulative statistics, provided that
they cannot be manipulated to obtain information about a specific record or a specific
data source. Trusted third parties are, however, difficult to find, and the procedure
involved is necessarily complicated and inefficient. This scenario is most evident in the
health maintenance business.Health Maintenance Organizations (HMOs) have a high
interest in sharing medical data, both for public health reasons, such as plague control
and the evaluation of different medical protocols, and for commercial reasons such as
detecting medical fraud patterns or medical misconduct. Similar examples can be found
in the financial domain where, for instance, account information should be shared in
order to detect money laundering. However, sharing data is very problematic: it is
legally forbidden to expose specific records i.e., a patient's medical record and it is
commercially undesirable to expose statistics about a single HMO e.g., mortality rates
or the average expenditure per client.
Distributed data mining allows data to be shared without compromising privacy.

On the one hand, data mining techniques have been shown to be a leading tool for data
analysis, and as such they are likely to satisfy researchers' needs as an interface to the
data stored in a Grid. On the other hand, the models produced by data mining tools are
statistical and thus satisfy the privacy concerns of the data owners. As a result, different
HMOs can choose to reveal their databases not for direct reading but rather to a
distributed data mining algorithm that will execute at the different sites and produce a
statistical model of the combined database. That the algorithm produces statistics still
does not guarantee privacy: an HMO also has to make certain that the data mining
20
algorithm itself does not leak information. For instance, an algorithm in which each
HMO computes its mortality rate and then sends it to a polling station which computes
the global statistics would not meet this criterion because the polling station would be
informed of the mortality rate for each HMO.
In [9] P. Krishna Prasad et al proposed a Sharing of data and carrying out
collaborative data mining has emerged as a powerful tool for analysis for mutual
profitability among several related business houses. However, the sharing of data in
collaboration has raised number of ethical issues like privacy, data security, and
intellectual property rights. From a philosophical point of view, Schoeman and Walters
gave three possible definitions for privacy: 1) Privacy as the right of a person to
determine which personal information about himself/herself may be communicated to
others 2) Privacy as the control over access to information about oneself 3) Privacy as
limited access to a person and to all the features related to the person. There are privacy
rules and regulations like HIPAA, GLBA and SOX that restrict the companies to share
their data in raw form to any other parties.In horizontally partitioned databases: Jha et
al., proposed a solution called privacy preserving k means algorithm. The crucial step
in their algorithm is computing mean vectors. They used oblivious transfer and
homomorphic encryption scheme for computing these vectors. The claim is that, by
knowing only mean vectors neither party can recompute individual vectors. Jagannathan
et al., gave a solution known as recluster, their solution uses divide, conquer and merge
strategy along with some cryptographic primitives like homomorphic encryption
scheme secure scalar product protocol and Yaos circuit evaluation protocol to compute
cluster centers securely.
In vertically partitioned databases: Vaidya and Clifton proposed a solution,
called privacy preserving kmeans clustering. Their approach is based on secure multi
party computation approach and they used cryptographic primitives like secure scalar
product protocol and Yao circuit evaluation protocol for readjusting mean vectors
securely. Prasad and Rangan proposed a solution, called privacy preserving BIRCH.
Their algorithm works for vertically partitioned, large databases. They used
cryptographic primitives like secure scalar product protocol, protocol for Yaos
millionaire problem, secure min index in vector sum protocol and threshold sum
protocol for computing clusters securely. Partitioned database case, each party learns the
structure of the CF tree and the entries of the tree are the corresponding shares of CFs of
21
the virtual database. That is each party in its memory will have its share of CF based on
its input database in the node.BIRCH algorithm is a well known algorithm for clustering
for effectively computing clusters in a large data set. As the data is typically distributed
over several sites, clustering over distributed data is an important problem. The data can
be distributed in horizontal, vertical or arbitrarily partitioned databases. But, because of
privacy issues no party may share its data to other parties. The problem is how the
parties can cluster the distributed data without breaching privacy of others data. The
solutions in arbitrarily partitioned database setting generally work for both horizontal
and vertically partitioned databases. In our work we give a procedure for securely
running BIRCH algorithm over arbitrarily partitioned database.
In [10] Rakesh Agrawal proposed Progress in bar-code technology has made it possible
for retail organizations to collect and store massive amounts of sales data, referred to as
the basket data. A record in such data typically consists of the transaction date and the
items bought in the transaction. Successful organizations view such databases as
important pieces of the marketing infrastructure. They are interested in instituting
information-driven marketing processes, managed by database technology, that enable
marketers to develop and implement customized marketing programs and
strategies.Algorithms for discovering large item sets make multiple passes over the data.
In the first pass, we count the support of individual items and determine which of them
are large, i.e. have minimum support. In each subsequent pass, we start with a seed set
of item sets found to be large in the previous pass. We use this seed set for generating
new potentially large item sets, called candidate item sets, and count the actual support
for these candidate item sets during the pass over the data. At the end of the pass, we
determine which of the candidate item sets are actually large, and they become the seed
for the next pass.The Apriori and AprioriTid algorithms we propose differ
fundamentally from the AIS and SETM algorithms in terms of which candidate item
sets are counted in a pass and in the way that those candidates are generated.
22
CHAPTER 2
SYSTEM DESIGN
2.1 SYSTEM CONFIGURATION
2.1.1 HARDWARE REQUIREMENTS

Processor : Pentium IV 2.4 GHz
Hard Disk : 80 GB
RAM : 256 MB
Video : 800 x 600 resolutions, 256 colors
2.1.2 SOFTWARE REQUIREMENTS

Operating System : Windows XP and above
Front End : J2EE (JSP, SERVLET), STRUTS
Back End : MY SQL 5.5
IDE : Eclipse
23
2.2 SYSTEM ANALYSIS
Distributed computing systems are being built and used more and more
frequently. This distributed computing revolution makes the reliability of distributed
systems as an important concern. It is fairly well-understood how to connect hardware
so that most components can continue to work when others are broken, and thus
increase the reliability of a system as a whole. This report addresses the issue of
providing mechanism for reliable distributed systems. In particular, we examine how to
program a system so that the software continues to work in the face of a variety of
failures of parts of the transaction.
A process that wishes to use transactions must be aware of certain primitives
associated with them. These primitives are:
1. begin transaction - mark the start
2. end transaction - mark the end; try to commit
3. abort transaction - kill transaction, restore old values
4. read data from object(file), write data to object(file).
2.2.1 Existing System
In a distributed system, a transaction may involve multiple processes on multiple

machines. Even in this environment, we still need to preserve the properties of
transactions and achieve an atomic commit (either all processes involved in the
transaction commit or else all of them will abort the transaction - it will be
unacceptable to have some commit and some abort). A protocol that achieves this
atomic commit is the two-phase commit protocol.
In existing system, we using two phase commit protocol in this protocol are also used
to store the database in cloud environment. In this system also we verify the person
are valid are not, and also we achieve the atomic commit protocol. It uses Two-Phase
Validation Commit protocol. We using Two-Phase Validation Commit protocol, by
using this algorithm to give the more secure transaction, identifies the both trusted
person and also achieves the acid properties. Before Authorize they send prepare-to-
24
commit technique they used for identify the person.In implementing this protocol, we
assume that one process will function as the coordinator and the rest as cohorts (the
coordinator may be the one that initiated the transaction, but that's not necessary). We
further assume that there is stable storage and a write-ahead log at each site.
Furthermore, we assume that no machine involved crashes forever.
The protocol works as follows (the coordinator is ready to commit and needs to
ensure that everyone else will do so as well):
Phase Co-Ordinator Cohort
Write prepare to commit Work on transaction;
message to the log when done, wait for
message
Send prepare to commit
Request message
Receive message.when
transaction is ready to
Wait for reply commit, write agree to
commit (or abort) to log
Send agree or abort
reply
Write commit message Wait for commit message
to the log
Send commit (or abort) Receive commit (or
message abort) message
Commit If a commit was received,
write commit to the
Wait for all cohorts to log, release all locks &
respond resources, update
databases. If an abort was
received, undo all
changes
Send done message
Clean up all state. Done
25
2.2.1.1 Disadvantages of Existing System
It cannot tolerate the Byzantine failure
User un friendly when the connection lost
2.2.2 Proposed System
The Byzantine Fault Tolerance two Phase Commit protocol (B2PC) is enhanced
from quorum based 3PC. The quorum based 3PC allows a quorum to make growth in
case of single failure. If the failure cascades the quorum gets lost even if we connect the
quorum later it still remains blocked. The B2PC will eliminate some of the negative
aspect faced by 2PC, 2P B2PC such as blocking problem, inconsistency towards the
network partitioning, more communication overhead. B2PC recovery system includes,
at every invocation the site try to elect new coordinator if the existing site is subjected to
be byzantine fault. Thus B2PC reduces the overhead of View change Algorithm
practice. Here we described the outline of view change algorithm with the following
circumstances when a faulty primary replica broadcast contradictory message to
different replica where the view change algorithm is used and the backup replica will
begin this view change algorithm. This algorithm is used to elect a new primary replica
when the existing primary is subjected to be Byzantine faulty. The B2PC protocol
achieves higher availability of data than 2PC by maintaining two extra counters.
Preselected:- state its Initial value is 0, It denotes the number of election took
part in the site and the variable value is updated when new coordinator is elected.
PreAttempt:- state its Initial value is 0. It denotes the Election number in
previous election.
Backup Two phase commit (B2PC) protocol overcomes the major issue faced by the
atomic Two Phase Commit protocol is resource blocking problem and this problem can
be overcome by adding a backup phase in normal 2 Phase Commit protocol. B2PC
protocol reduces the Byzantine Agreement usage because using Byzantine Agreement
in each and every phase of operation is impossible and it could be prohibitively very
26
expensive. In this protocol, a backup site is employed for each coordinator. After
nd
receiving notification from all participants in the first phase, its starts the 2 phase of
backup phase in which the coordinator will communicates its final decision to the
backup site. Afterwards, it broadcast its the final decision to all participants.
1. First phase
Coordinator: Initially the coordinator will broadcast the Begin_commit request

message to all participants and enters into wait state.
Participant: When the participant receive the request message, If the participant want
to commit the transaction means it respond with the Vote_commit message to the
coordinator and enters into ready state. Otherwise, the participant responds with the
Vote_abort message to the coordinator.
Coordinator: When the coordinator receives the reply from participant it starts 2nd
phase.
2.Second phase
Coordinator: when the coordinator receives Vote_commit message , it writes the

Decided_to_commit message to its corresponding backup site. Otherwise if the
coordinator receives Vote_abort message from the participant, its send the global abort
message to all participants.
Backup site: After receiving Decided_to_commit message from the coordinator, the
backup site send the Recorded_commit message to corresponding coordinator.
3. Third phase
Coordinator: After receiving Decided_to_commit message from the backup site, the
coordinator broadcast the global commit message to all participants.
Participants: The Participant follows the coordinators command and it will
acknowledge the coordinator.
27
2.2.2.1 Advantages of Proposed System
It can eliminate blocking problem faced by the atomic Two Phase Commit protocol by
adding an extra hardware (i.e. backup site)
2.2.3 Feasibility Study
Feasibility studies aim to objectively and rationally uncover the strengths and weaknesses
of the existing business or proposed venture, opportunities and threats as presented by the
environment, the resources required to carry through, and ultimately the prospects for
success. In its simplest term, the two criteria to judge feasibility are cost required and value
to be attained. As such, a well-designed feasibility study should provide a historical
background of the business or project, description of the product or service, accounting
statements, details of the operations and management, marketing research and policies,
financial data, legal requirements and tax obligations. Generally, feasibility studies precede
technical development and project implementation.
2.2.3.1 Economical Feasibility
This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available.
2.2.3.2 Technical Feasibility
Technical feasibility study is carried out to check the technical feasibility, that is,
the technical requirements of the system. Any system developed must not have a high
demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the client.
28
CHAPTER 3
SYSTEM DESCRIPTION
3.1 MODULES DESCRIPTION
Our proposed Backup Two phase commit (B2PC) protocol overcomes the major issue
faced by the atomic Two Phase Commit protocol is resource blocking problem and this
problem can be overcome by adding a backup phase in normal 2 Phase Commit
protocol. B2PC protocol reduces the Byzantine Agreement usage because using
Byzantine Agreement in each and every phase of operation is impossible and it could be
prohibitively very expensive. In this protocol, a backup site is employed for each
coordinator. After receiving notification from all participants in the first phase, its starts
nd
the 2 phase of backup phase in which the coordinator will communicates its final
decision to the backup site. Afterwards, it broadcast its the final decision to all
participants.
Number of modules are
1. Setup
2. Commit Request Phase
3. Commit Decide phase
4. Broadcast Message Phase
5. Performance Evaluation
3.1.1 System Setup
Using java we create the distributed transaction setup. The transaction manager, user,
and the back up servers will be configures based upon the requirements. The
TransactionManager interface defines the methods that allow an application server to
29
manage transaction boundaries. A transaction manager provides the services and

management functions required to support transaction demarcation, transactional
resource management, synchronization, and transaction context propagation. A backup
server responsible for backing up and restoring files, folders, databases and hard drives
on a network in order to prevent the loss of data in the event of a hard drive failure, user
error, disaster or accident. The user is the who intends to start the transactions.
3.1.2 Commit Request Phase
At first the site at which the transaction originates becomes the coordinator and it asks
the other sites to vote to either commit or abort . The other sites send their votes . If all
sites have voted to commit the transaction, it decides to commit the transaction and if
even if one of the sites has voted to abort the transaction it decides to abort.
Coordinator: Initially the coordinator will broadcast the Begin_commit request
message to all participants and enters into wait state.
Participant: When the participant receive the request message, If the participant want
to commit the transaction means it respond with the Vote_commit message to the
coordinator and enters into ready state. Otherwise, the participant responds with the
Vote_abort message to the coordinator.
Coordinator: When the coordinator receives the reply from participant it starts 2nd
phase.
3.1.3 Commit decide phase
The coordinator tells its decision to all of the sites.If it has decided to commit then
Enter into ready to commit stage message is sent.
Coordinator: when the coordinator receives Vote_commit message , it writes the
Decided_to_commit message to its corresponding backup site. Otherwise if the
coordinator receives Vote_abort message from the participant, its send the global abort
message to all participants.
Backup site: After receiving Decided_to_commit message from the coordinator, the
backup site send the Recorded_commit message to corresponding coordinator.
30
3.1.4 Broadcast Message Phase
If the coordinator has decided to commit the transaction it sends a global_commit to all
sites and waits for their acknowledgement . Only after receiving acknowledgement it
decides to commit the transaction . If the coordinator has decided to abort the
transaction it sends global_abort to all the sites and aborts the transaction. Only after
receiving the acknowledgement it decides the fate of the transaction.
Coordinator: After receiving Decided_to_commit message from the backup site, the
coordinator broadcast the global commit message to all participants.
Participants: The Participant follows the coordinators command and it will
acknowledge the coordinator.
3.1.5 Performance Evaluation
The performance evaluation is done to prove the effectiveness of the proposed

algorithm by comparing it with the existing algorithms. Failure probability of backup
coordinator while coordinator is down is the metric which is used to compare the
proposed approach with the existing methodologies
3.2 ARCHITECTURE DIAGRAM
FIG 3.6
31
3.3 DATA FLOW DIAGRAM
Co-Ordinator
Vote_commit/ vote Decide_Commit

abort
Commit / global record_Commit

abort
Back_up Site
Participants
global_Commit
Fig 3.7 Data flow diagram

32
CHAPTER 4
EXPERIMENTAL RESULT
The comparison of existing work with the proposed work is done to evaluate the
performance of the proposed work. It is done based on the time performance metric and
the cost performance metric.
4.1 TIME PERFORMANCE
This metric defines the overall time performance in terms of how m uch time it takes to
perform an entire transaction. The following graph indicates the overall time
performance.
Fig 4.8 Time performance

In this graph x axis plot that methodology whether existing or proposed and the y axis
plot that time taken to perform the entire transaction in milliseconds. The graph proves
that the proposed work consumes less time than the existing work.
33
4.2 COST PERFORMANCE
The cost for processing the entire transaction in terms of rupees for both existing work
and the proposed work is represented in the following graph
Fig 4.8 Cost

In this graph, y axis denotes the cost taken to process the entire transaction in terms of
rupees and this graph proves that the proposed work conserves less cost than the
existing work
34
CHAPTER 5
CONCLUSION
Despite the popularity of cloud services and their wide adoption by enterprises
and governments, cloud providers still lack services that guarantee both data and access
control policy consistency across multiple data centers. In this paper, we identified
several consistency problems that can arise during cloud-hosted transaction processing
using weak consistency models, particularly if policy-based authorization systems are
used to enforce access controls.In this work, we proposed the Byzantine Fault Tolerance
technique using Backup Two Phase Commit protocol (B2PC) which reduces the
Byzantine Agreement usage since it is very expensive.
The B2PC protocol can eliminate blocking problem faced by the atomic Two
Phase Commit protocol by adding an extra hardware (i.e. backup site) and it can be
easily implemented in normal 2 Phase Commit protocol. According to WSAT
specification, the coordinator of transaction provides a set of core services to initiator
and to participant, if we make harden this services the transaction can be trustworthy
even in untrusted situation over internet. Thus our proposed BFT Technique can
achieves high degree of availability, reliability, security and also less expensive.
35
APPENDIX 1
SOURCE CODE
CLOUD DATABASE
public class CloudDB extends Thread implements ActionListener

{
public static final String ALGORITHM = "RSA";
public static String PRIVATE_KEY_FILE = "";
public static String PUBLIC_KEY_FILE = "";
JFrame frm=new JFrame("CloudDB");
JPanel mainpan=new JPanel();
JPanel pan1=new JPanel();
public static JTextArea ta=new JTextArea();
public static JScrollPane jsp=new JScrollPane(ta);
JButton start=new JButton("Connect");
JButton set1=new JButton("Time");
JButton set2=new JButton("Cost");
ServerSocket ss=null;
Socket sc=null;
int portno=0,rr=0;
BufferedWriter
bws=null; double length;
int nport;
Socket soc[]=null;
ObjectInputStream cin[]=null;
ObjectOutputStream cout[]=null;
String clientadd[]=null;
double ent[]=null;
String nm;
int noc=3;//no of clients
36
database_conn db=null;
int ctc=800;
JLabel ldk=new JLabel("Number of Clients");
JTextField ts1=new JTextField();
JTextField tf1=new JTextField();
Vector<byte[]> fulls;
Vector<String> names;
byte cip[]=null;
ObjectInputStream inputStream = null;
public CloudDB()
{
try
{
db=new database_conn();
ta.setFont(new Font("Serif", Font.BOLD,
16)); ta.setBackground(new
Color(240,248,255)); ta.setForeground(new
Color(128,0,128)); mainpan.setLayout(null);
pan1.setLayout(null);
mainpan.setBackground(new Color(47,79,79));
pan1.setBackground(new Color(95,158,160));
mainpan.add(pan1);
mainpan.add(pan2);
pan1.add(jsp);
pan2.add(start);
pan2.add(set1);
pan2.add(set2);
pan2.add(ldk);
pan2.add(ts1);
ldk.setBounds(20,50,200,30);
ts1.setBounds(20,100,100,30);
37
start.setForeground(new Color(128,0,128));
frm.setVisible(true);
Dimension d =
Toolkit.getDefaultToolkit().getScreenSize();
frm.setSize(d.width-30,d.height-70);
pan1.setBounds(10,10,d.width-300,d.height-50);
pan2.setBounds(d.width-280,10,d.width-10,d.height-50);
start.setBounds(10,150,150,30);
set1.setBounds(10,200,150,30);
set2.setBounds(10,250,150,30);
jsp.setBounds(20,20,d.width-400,d.height-160);
ta.setFont(new Font("Serif", Font.BOLD, 16));
frm.setLocation(20,20);
frm.add(mainpan);
start.addActionListener(this);
tf1.setText("7000");
}
catch(Exception jj)
{
jj.printStackTrace();
}
}
public void actionPerformed(ActionEvent ae)
{
if(ae.getSource()==start)
{
try
{
noc=Integer.parseInt(ts1.getText().trim());
String str=tf1.getText();
portno=Integer.parseInt(str);
ent=new double[noc];
ss=new ServerSocket(portno);
38
soc=new Socket[noc];
cin=new ObjectInputStream[noc];
cout=new ObjectOutputStream[noc];
clientadd=new String[noc];
ta.append("\nCloudServerStartedAtPort
N"+portno+" !!!\n");
while(rr<noc)
{
sc=ss.accept();
cout[rr]=new
ObjectOutputStream(sc.getOutputStream());
cin[rr]=new
ObjectInputStream(sc.getInputStream());
if(rr==0)
new Thread1();//User1
if(rr==1)
if(rr==2)
if(rr==3)
if(rr==4)
rr++;
}
ta.append("All User's are Connected to
CloudDB\n");
}
catch(Exception jj)
{
}
}
39
if(ae.getSource()==set1)
{
String gs[]={};
BarChart.main(gs);
}
if(ae.getSource()==set2)
{
String gs[]={};
BarChart1.main(gs);
}
}
VTT
public class VTT extends Thread implements ActionListener

{
JFrame frm=new JFrame("VTT");
ServerSocket ss=null;
Socket sc=null;
int portno=0,rr=0;
BufferedWriter
bws=null; double length;
int nport;
Socket soc[]=null;
ObjectInputStream cin[]=null;
ObjectOutputStream cout[]=null;
String clientadd[]=null;
40
double ent[]=null;
String nm;
int noc=3;//no of clients

database_conn db=null;
JLabel ldk=new JLabel("Number of Clients");

JTextField ts1=new JTextField();
JTextField tf1=new
JTextField(); Vector<byte[]>
fulls; Vector<String> names;
byte cip[]=null;
ObjectInputStream inputStream = null;
static String Tpp1="",Tpp2="",Tpp3="",Tpp4="",Tpp5="";
public VTT()
{
try
{
Tpp1="";Tpp2="";Tpp3="";Tpp4="";Tpp5="";
db=new database_conn();
ta.setFont(new Font("Serif", Font.BOLD,

16)); ta.setBackground(new
Color(240,248,255)); ta.setForeground(new
Color(128,0,128)); mainpan.setLayout(null);
mainpan.add(pan1);
mainpan.add(pan2);
pan1.add(jsp);
pan2.add(start);
pan2.add(ldk);
pan2.add(ts1);
41
ldk.setBounds(20,50,200,30);
ts1.setBounds(20,100,100,30);
Dimension d = Toolkit.getDefaultToolkit().getScreenSize();
start.setBounds(10,150,150,30); jsp.setBounds(20,20,d.width-
400,d.height-160); ta.setFont(new Font("Serif", Font.BOLD,
16)); frm.setLocation(20,20);
frm.add(mainpan);
}
catch(Exception jj)
{
}
}
{
if(ae.getSource()==start)
{
try
{
noc=Integer.parseInt(ts1.getText().trim());
String str=tf1.getText();
portno=Integer.parseInt(str);
ent=new double[noc];
ss=new ServerSocket(portno);
soc=new Socket[noc];
42
cin=new ObjectInputStream[noc];
cout=new ObjectOutputStream[noc];
clientadd=new String[noc];
ta.append("\nVTT Server Started At Port NO
"+portno+" !!!\n");
while(rr<noc)
{
sc=ss.accept();
cout[rr]=new
cin[rr]=new
ObjectInputStream(sc.getInputStream());
if(rr==0)
if(rr==1)
if(rr==2)
if(rr==3)
if(rr==4)
rr++;
}
ta.append("All User's are Connected to VTT\n");
}
catch(Exception jj)
{
}
}
43
USER
public class User1 extends Thread implements ActionListener
{
String rem="",reww="";
JFrame frm=new JFrame("User1");
Socket Socket;
static ObjectOutputStream out;
static ObjectInputStream in;
static ObjectOutputStream out1;

static ObjectInputStream in1;
String message;
String as;
String nm1;
String path,filename;
JButton start1=new JButton("Create");
JButton start2=new JButton("Encrypt");
JButton start3=new JButton("Decrypt");
int portno=0;
int a;
JTextField tf1=new JTextField();//server IP
JTextField tf2=new JTextField();//Server Port
JTextField tf3=new JTextField(20);
Socket sc,sc1;
static Runtime runtime;
long time1=0,time2=0,mem1=0,mem2=0,diff1=0,diff2=0;
BufferedWriter bws=null;
44
public User1()
{
try
{
runtime=Runtime.getRuntime();
bws=new BufferedWriter(new FileWriter("Graph2.txt"));
time1=((System.currentTimeMillis()));
mem1=(runtime.totalMemory() -runtime.freeMemory());
ta.setFont(new Font("Serif", Font.BOLD, 16));
ta.setBackground(new Color(240,248,255));
ta.setForeground(new Color(128,0,128));
mainpan.setLayout(null);
mainpan.add(pan1);
mainpan.add(pan2);
pan1.add(jsp);
pan2.add(start);
pan2.add(start1);
pan2.add(start2);
pan2.add(start3);
pan2.add(tf1);
Dimension =Toolkit.getDefaultToolkit().getScreenSize();
start.setBounds(10,100,150,30);
start1.setBounds(10,300,150,30);
45
tf1.setBounds(10,200,150,30);
jsp.setBounds(20,20,d.width-400,d.height-160);
frm.setLocation(20,20);
frm.add(mainpan);
start1.addActionListener(this);
tf1.setText("localhost");
}
catch(Exception jj)
{
}
}
{
try
{
if (ae.getSource()==start)
{
socket();
new Threads1();
}
if (ae.getSource()==start1)
{
new UserForm(1,"User1");
}
46
if(ae.getSource()==start2)
{
new Encryption(1,"User1");
}
if(ae.getSource()==start3)
{
out1.writeObject("TPVC1");
out.writeObject("download1");
out.writeObject("downloading");
String res=""+in.readObject();
if(reww.equals("True"))
{
System.out.println("res "+res);
ta.append("So Finally Decrpted
Content Details is "+res+"\n");
}
else
{
ta.append("So Verification Failed!!!\n");
}
time2=((System.currentTimeMillis()));
mem2=(runtime.totalMemory() -
runtime.freeMemory());
diff1=Math.abs(time2-time1);
diff2=Math.abs(mem2-mem1);
bws.write(diff1+"-"+(diff2));
bws.close();
}
}
catch(Exception em)
{
em.printStackTrace();
}
47
}
public void socket()
{
try
{
boolean connected;
as=tf1.getText();
String as1="7000";
int port =Integer.parseInt(as1);
sc= new Socket(as,port);
int a=sc.getLocalPort();
InetAddress thisIp =InetAddress.getLocalHost();
String as2=""+thisIp.getHostAddress();
nm1=""+a+" "+as2;
out = new
in = new ObjectInputStream(sc.getInputStream());
ta.append("\nUser1 Connected with CloudDb!!!
Thread th=new Thread();
start();
}
catch(Exception e)
{
e.printStackTrace();
}
}
public void run()
{
try
{
}
catch(Exception em)
48
{
}
}
public static void main(String args[])
{
User1 ma=new User1();
}
public class Threads1 extends Thread
{
public Threads1()
{
try
{
boolean connected;
as=tf1.getText();
String as1="8000";
int port =Integer.parseInt(as1);
sc1= new Socket(as,port);
int a=sc1.getLocalPort();
InetAddress thisIp =InetAddress.getLocalHost();
String as2=""+thisIp.getHostAddress();
nm1=""+a+" "+as2;
out1 = new
ObjectOutputStream(sc1.getOutputStream());
in1 = new
ObjectInputStream(sc1.getInputStream());
ta.append("\nUser1 Connected with VTT!!! \n");
Thread th=new Thread();
start();
}
catch(Exception em)
{
49
}
}
public void run()
{
try
{
while(true)
{
String tes1=(""+in1.readObject());
if(tes1.equals("PTV1"))
{
System.out.println("User1 "+tes1);
ta.append("Prepare-to-Validate Requset
Received from VTT!!!\n");
}
if(tes1.equals("Res1"))
{
String gn1=(""+in1.readObject());
reww=gn1;
ta.append("\n Verification Success!!!\n");
}
}
}
catch(Exception em)
{
}
}
50
USER FORM
public class UserForm extends JFrame implements ActionListener
{
String no[]={"1","2","3","4","5","6","7","8"};
JComboBox cb=new JComboBox(no);
JLabel ld1=new JLabel("Select No of Columns");
JPanel pp=null;
JButton add,sub,sub1,sub2,sub3;
static int nod=0;
JLabel ld[]=null;
JTextField td[]=null;
JComboBox cb1[]=null;
//String type[]={"int","varchar(50)","numeric(18, 0)"};
String type[]={"varchar(5000)"};
public static String
qur="",tabs="Usertab",qur1=""; String nn="";
static String cols[]=null;
public UserForm(int np,String ng)
{
try
{
nn=ng.trim();
tabs+=(np);
qur="";
qur1="";
add=new JButton("ADD");
sub=new
JButton("Create"); pp=new
JPanel(null); pp.add(cb);
pp.add(ld1);
pp.add(add);
pp.add(sub);
51
User1.out.writeObject("User1");
System.out.println("UserForm "+qur);
User1.out.writeObject(qur);
User1.out.writeObject(qur1);
//User1.out1.writeObject("Respone1");
//User1.out1.writeObject(qur1);
}
else if(nn.equals("User2"))
{
System.out.println("UserForm "+qur);
}
{
System.out.println("UserForm
}
{
}
{
52
}
setVisible(false);
}
}
catch(Exception em)
{
}
}
public static void main(String as[])
{
int i=1;
new UserForm(i,"User1");
}
}
53
APPENDIX 2
SCREEN SHOTS
CLOUD DATABASE SETUP
VERIFIED TRUSTED THIRD PARTY SETUP

54
USER 1 SETUP
DATABASE CREATION
55
ENTERING DATA IN DATABASE
DATA ENCRYPTED AND STORED IN DATABASE

56
CHECKING RULES IN VTT
VERIFICATION FAILED
57
REFERENCES
[1] Marian K. Iskander, Tucker Trainor, Dave W. Wilkinson, Adam J. Lee

Balancing Performance, Accuracy, and Precision for Secure Cloud
TransactionsIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014
[2] Agrawal R and Srikant R, Privacy-preserving data mining, in Proc.
ACM SIGMOD Int. Conf. Manage. Data, 2000, pp. 439450.
[3] Clifton C,Kantarcioglu M, and Vaidya J, Defining privacy for data mining, in
Proc. Nat. Sci. Found.Workshop Next Generation Data Mining, 2002, pp.126-
133.
[4] Giannotti F,Lakshmanan L V, Monreale A,Pedreschi D, and Wang H, Privacy-
preserving data mining from outsourced databases, in Proc. SPCC2010
Conjunction with CPDP, 2010, pp. 411426.
[5] Gilburd B,Schuster A, and Wolff R, k-ttp: A new privacy model for large scale
distributed environments, in Proc. Int. Conf. Very Large Data Bases, 2005, pp.
563568.
[6] Kantarcioglu M and Clifton C, Privacy-preserving distributed mining of
association rules on horizontally partitioned data, IEEE Trans. Knowledge Data
Eng., vol. 16, no. 9, pp. 10261037, Sep. 2004.
[7] Molloy I,Li N, and Li T, On the (in)security and (im)practicality of outsourcing
precise association rule mining, in Proc. IEEE Int. Conf. Data Mining, Dec.
2009, pp. 872877.
[8] Qiu K,Li Y, and Wu X, Protecting business intelligence and customer privacy
while outsourcing data mining tasks, Knowledge Inform. Syst., vol. 17, no. 1,
pp. 99120, 2008.
[9] Rizvi S J and Haritsa J R, Maintaining data privacy in association rule mining,
in Proc. Int. Conf. Very Large Data Bases, 2002, pp. 682 693.
[10] Wong W.K, Cheung D W,Hung E,Kao B, and Mamoulis N, Security in
outsourcing of association rule mining, in Proc. Int. Conf. Very Large Data
Bases,2007, pp. 111122.

3 DocumentFINAl18nov PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 DocumentFINAl18nov PDF

Uploaded by

Copyright:

Available Formats

1

Cloud computing is the delivery of computing as a service rather than a product,

Figure1.1. Cloud Computing

1.1 BACKGROUND CONCEPTS

1.1.1 Cloud Computing

Empowerment of end-users of computing resources by putting the provisioning

Agility improves with users' ability to re-provision technological infrastructure

Application programming interface (API) accessibility to software that enables

Cost is claimed to be reduced and in a public cloud delivery model capital

Virtualization technology allows servers and storage devices to be shared and

o Centralization of infrastructure in locations with lower costs (such as real

o Peak-load capacity increases (users need not engineer for highest

Scalability and Elasticity via dynamic ("on-demand") provisioning of resources

Performance is monitored and consistent and loosely coupled architectures are

Security could improve due to centralization of data, increased security-focused

Maintenance of cloud computing applications is easier, because they do not need

1.1.1.1 Service Models

Cloud computing providers offer their services according to three fundamental

Figure 1.2 Service Models

1.1.1.2 Cloud clients

1.1.1.3 Deployment models

Figure 1.3 Deployment Models

Cloud computing types

Community cloud shares infrastructure between several organizations from a

Hybrid cloud is a composition of two or more clouds (private, community, or

Private cloud is infrastructure operated solely for a single organization, whether

Figure1.4 Cloud Computing - Sample Architecture

1.1.2 Distributed Computing

A computer program that runs in a distributed system is called a distributed

The entities communicate with each other by message passing.

In this article, the computational entities are called computers or nodes.

Client/Server System : The Client-server architecture is a way to provide a

Fig.1.5 a b) Distributed computing, c) Parallel Computing

1.1.3 Parallel and distributed computing

In parallel computing, all processors may have access to a shared memory to

In distributed computing, each processor has its own private memory

The system has to tolerate failures in individual computers.

The structure of the system (network topology, network latency, number of

1.1.3.1 ACID criteria

Transaction processing has these benefits:

It allows sharing of computer resources among many users

It avoids idling the computing resources without minute-by-minute human

It is used on expensive classes of computers to help amortize the cost by keeping

1.2 PROBLEM STATEMENT

1.3 OBJECTIVE OF THE PROJECT

1.4 SCOPE OF THE PROJECT

To overcome the Byzantine Failure and to attain the Atomicity,Consistency,Isolation

1.5 LITERATURE SURVEY

In [1] Marian K. Iskander, Tucker Trainor, Dave W. Wilkinson proposed a

In [2] W. K. Wong et al Proposed an Association rule mining aims at the

In [3] Ling Qiu et al proposed a functional division may have to delegate or

research we focus on the technology-based solutions. When outsourcing mining tasks

In [6] Shariq J. Rizvi et al proposed the knowledge models produced through

affect her employment opportunities. We investigate, in this paper, whether customers

In [7] Murat Kantarcoglu proposed Data mining technology has emerged as a

In [8] Bobi Gilburd et al proposed Frequently prohibited by legal obligations or

Distributed data mining allows data to be shared without compromising privacy.

2.1 SYSTEM CONFIGURATION

2.1.1 HARDWARE REQUIREMENTS

2.1.2 SOFTWARE REQUIREMENTS

2.2 SYSTEM ANALYSIS

1. begin transaction - mark the start

2. end transaction - mark the end; try to commit