Data Leakage Protection

CHAPTER 1
INTRODUCTION
In the course of doing business, sometimes sensitive data must be handed over to
supposedly trusted third parties. For example, a hospital may give patient records to
researchers who will devise new treatments. Similarly, a company may have partnerships
with other companies that require sharing customer data. Another enterprise may outsource
its data processing, so data must be given to various other companies .We call the owner of
the data the distributor and the supposedly trusted third parties the agents. Our goal is to
detect when the distributors sensitive data has been leaked by agents, and if possible to
identify the agent that leaked the data. We consider applications where the original sensitive
data cannot be perturbed.
Perturbation is a very useful technique where the data is modified and made less
sensitive before being handed to agents. For example, one can add random noise to certain
attributes, or one can replace exact values by ranges. However, in some cases it is important
not to alter the original distributors data. For example, if an outsourcer is doing our payroll,
he must have the exact salary and customer bank account numbers. If medical researchers
will be treating patients (as opposed to simply computing statistics), they may need accurate
data for the patients. Traditionally, leakage detection is handled by watermarking, e.g., a
unique code is embedded in each distributed copy. If that copy is later discovered in the
hands of an unauthorized party, the leaker can be identified.
Specifically, we study the following scenario: After giving a set of objects to agents,
the distributor discovers some of those same objects in an unauthorized place. (For example,
the data may be found on a web site, or may be obtained through a legal discovery process.)
At this point the distributor can assess the likelihood that the leaked data came from one or
more agents, as opposed to having been independently gathered by other means.
If the distributor sees enough evidence that an agent leaked data, he may stop doing
business with him, or may initiate legal proceedings. We consider adding fake objects to
the distributed set. Such objects do not correspond to real entities but appear realistic to the
agents. In a sense, the fake objects acts as a type of watermark for the entire set, without
modifying any individual members. If it turns out an agent was given one or more fake
objects that were leaked, then the distributor can be more confident that agent was guilty.
1.1 AIM AND OBJECTIVES

To distribute data appropriately to agents .
To detect the leakage of distributors sensitive data by agents .
To identify the agent that leaked the data.

CHAPTER 2

PROBLEM STATEMENT

The main focus is the data allocation problem on how can the distributor
intelligently give data to agents in order to improve the chances of detecting the guilty
agent. This project addresses the problem of handling fake objects so that every fake tuple
assigned to agent is unique and not common for any two agents.

REQUIREMENT
ANALYSIS
CHAPTER 3
REQUIREMENT ANALYSIS

Requirement analysis describes the existing system and the proposed system.
3.1 Existing System
In existing system, leakage detection is handled by watermarking, e.g., a unique code
is embedded in each distributed copy. If that copy is later discovered in the hands of an
unauthorized party, the leaker can be identified. Watermarks can be very useful in some
cases, but again, involve some modification of the original data. Furthermore, watermarks
can sometimes be destroyed if the data recipient is malicious. E.g. A company may have
partnerships with other companies that require sharing customer data. Another enterprise may
outsource its data processing, so data must be given to various other companies.
3.2 Proposed System

Our goal is to detect when the distributors sensitive data has been leaked by agents,
and if possible to identify the agent that leaked the data. Perturbation is a very useful
technique where the data is modified and made less sensitive before being handed to
agents. We develop unobtrusive techniques for detecting leakage of a set of objects or
records.

We develop a model for assessing the guilt of agents. Finally, we also consider the
option of adding fake objects to the distributed set. Such objects do not correspond to real
entities but appear realistic to the agents. In a sense, the fake objects acts as a type of
watermark for the entire set, without modifying any individual members. If it turns out an
agent was given one or more fake objects that were leaked, then the distributor can be more
confident that agent was guilty.

TECHNOLOGIES USED
6.1 .NET Framework
The .NET Framework is an environment for building, deploying and running XML
web services and other applications. It is the infrastructure for the overall .NET platform. The
.NET framework consists of three main parts: the common language runtime, the class
libraries and ASP.NET.
The common language runtime and the class libraries, including windows forms,
ADO.NET, and ASP.NET, combine to form services and solutions that can be easily
integrated within and across a variety of systems. The .NET system provides a fully
managed, protected and feature rich application execution environment.
6.2 ASP.NET
ASP.NET is more than the next version of Active Server Pages(ASP); it is a unified
web development platform that provides the services necessary for developers to build
enterprise class web applications. While ASP.NET is largely syntax compatible with ASP,
it also provides a new programming model and infrastructure that enables a powerful new
class of applications. You can migrate your existing ASP applications by incrementally
adding ASP.NET functionality to them.
ASP.NET is compiled .NET framework based environment. You cab author
applications in any .NET framework compatible language, including Visual Basic and Visual
C#. Additionally the entire .NET framework platform is available to .NET application.
Developers can easily access the benefits of the .NET framework, which includes a fully
managed, protected, and feature rich application execution environment, simplified
development and deployment, and seamless integration with the wide variety of languages.
6.2 Microsoft sql server 2008
Business today demands a different kind of data management solution. Performance,
Scalability and Reliability are essential, but businesses now expect more from their IT
investment.
SQL server 2005 exceeds dependability requirements and provides innovative
capabilities that increase employee effectiveness, integrate heterogeneous IT ecosystems, and
maximize capital and operating budgets. SQL server 2005 provides the enterprise data
management platform your organization needs to adapt quickly in a fast changing
environment.
With the lowest implementation and maintenance cost in the industry, SQL server
2005 delivers repaid return on your data management investment. It supports the rapid
development of enterprise class business application that can give your company a critical
competitive advantage.
Benchmarked for scalability, speed and performance it is a fully enterprise
class database, providing core support for Extensible Markup Language(XML) and Internet.

Data Leakage Protection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Leakage Protection

Uploaded by

Copyright:

Available Formats

CHAPTER 1

You might also like