You are on page 1of 22

A

SEMINAR
ON
“Data Leakage Detection”

By
Mr. Aniruddha D. Talole
Guided By
Prof. S. R. Todmal

Department Of Computer Engineering
ICOER, Wagholi, Pune.
CONTENTS
 INTRODUCTION
 PLACES FROM WHERE DATA LEAKS
 DATA LEAKAGE LANDSCAPE
 WAYS TO AVOID DATA LEAKAGE
 DATA LEAKAGE AVOIDANCE
 AFFORDABLE SOLUTIONS
 MODULES FOR DATA LEAKAGE DETECTION
 DATA LEAKAGE DETECTION USING DATA ALLOCATION
STRATEGY METHOD
 DATA ALLOCATION PROBLEM
 CONCLUSION
 REFERENCES




INTRODUCTION

In most organizations Data protection programs are concerned with
protecting sensitive data from :-

 External malicious attacks
 Technical controls that include perimeter security
 Network/Wireless Surveillance
 Monitoring Applications
 Point Security Management &
 User awareness and education





PLACES FROM WHERE DATA LEAKS
 Postal Mail, e-mail, File transfers or Instant Messaging
 Lost or stolen computers, Laptops and Mobile Devices,
 Hard Disks and Portable Storage (CDs, USB drives) or Backup
Devices & paper files
 Insecure transmission of personal identifiable and other restricted
data
 Authorized insider abuse of databases and other back-end systems
 Re-use of electronic resources (laptops and backup devices)
 Lack of separation of duties and access controls on databases and
other shared systems

DATA LEAKAGE LANDSCAPE

NEED:
To provide an initial landscape of how these laws relate to data leakage
overall.
Information lives in three buckets or containers such as :-
a) In digital form,
b) In hardcopy (paper)
c) In the conversation and heads of people


Databases & other Shared Systems
 Data Leak
A senior database administrator responsible for defining and enforcing data
access rights at the company
 Solution
Provide for Separation of Duties (SOD) between technical controls users &
strong access controls and multi-factor authentication
E-mail
 Data Leak
Chance to mistakenly e-mailed confidential data or discussions to reporter
 Solution
Encrypt critical business communications, sharing keys with trusted
business partner exchanges




 Instant Messaging
 Data Leak
A bug discovered that could unleash a series of attacks on an AOL , Instant Messenger
user, with the most serious side effect being a remote hijack by a hacker
 Solution
Implement endpoint security controls to protect against drive-by downloads of malicious
code to browsers, IM, and e-mail applications on computing assets and mobile devices.

 File-Sharing Site
 Data Leak
Confidential data accessed by two unknown parties when the data was loaded to a
company file sharing site
 Solution
Implement general DLP filtering as a safety net against accidents that expose large
quantities of privacy data being uploaded to the FTP site to begin with.






 Physical Access to Computer
 Data Leak
An unidentified person is believed to have used the machine to send spam e-mails.
 Solution
Program an automatic system shutdown or lockout after a specified time, requiring user
credentials (i.e.,password/token/biometric reading) to resume operating the computer.

 Network Remote Access
 Data Leak
Users able to log into their work placed desktop computers from home via an Internet available
RDP connection (no VPN) and able to view patient information over an unencrypted channel.
 Solution
Strong authentication and encrypted communication channels should be required for all remote
access users.

WAYS TO AVOID DATA LEAKAGE

 Handling Data According to Classification and Culture
 Design Your Employee Training Program with Experience in Mind
 Create a More Holistic Security Environment
 Adapting to Change is Not an Option
 Implement Controls for Detection and Prevention
 Administrative controls i.e. appropriate policies, guidelines and
practices consistent with the application
 Physical controls i.e. paper shredders, locking computer cases and
biometric access
 Technical controls i.e. traditional security tools like encryption,
outbound
filtering and content controls

DATA LEAKAGE AVOIDANCE


It is a two stage process:
1. Tagging every observation with legitimacy tags during collection.
2. Observing what it call a learn-predict separation







1. Tagging every observation with legitimacy tags during collection.
An ancillary data attached to every pair (x, y) of observational input instance x
and target instance. With this tagged version of the database it is possible,for
every example being studied
 To roll back the state of the world to a legitimate decision state
 To eliminate any confusion that may arise from only considering the original
raw data

2. Observing what it call a learn-predict separation
The modeler uses the raw but tagged data to construct training examples in
such a way that -
 For each target instance, only those observational inputs which are purely
legitimate for predicting it are included as features
 Only observational inputs which are purely legitimate with all evaluation
targets may serve as examples




AFFORDABLE SOLUTIONS
It includes tactical low-cost & security advancements.








Deterrence It is designed to prevent a less casual or opportunistic attacker
from attempting to circumvent security in an effort to achieve a non compliant
end.
Detection It allows for the identification of potential attack situations executed
by a more casual attacker.
Defense It is the prevention or suppression of attacks already in progress and
commissioned by a determined attacker
MODULES FOR DATA LEAKAGE DETECTION


1. Data Allocation Module :-
It deals with data allocation problem such as how can the distributor
intelligently give data to agents in order to improve the chances of
detecting a guilty agent.
2. Fake Object Module :-
Fake objects are objects generated by the distributor in order to increase
the chances of detecting agents that leak data. The distributor may be able
to add fake objects to the distributed data in order to improve his
effectiveness in detecting guilty agents.
3. Optimization Module :-
It is the distributor’s data allocation to agents has one constraint and one
objective. The objective is to be able to detect an agent who leaks any
portion of his data. User can able to lock and unlock the files for security.





4. Data Distributor Module :-
A data distributor has given sensitive data to a set of supposedly trusted
agents (third parties). The distributor must assess the likelihood that the
leaked data came from admin.

5. Agent Guilt Module :-
To compute this it needs an estimate for the probability that values in S can
be guessed by the target. It can conduct an experiment and ask a person
with approximately the expertise and resources of the target to find the email
of say 100 individuals.

DATA LEAKAGE DETECTION USING
DATA ALLOCATION STRATEGY METHOD


Here, is an unobtrusive technique for detecting leakage of a set of objects
or records.
 After giving a set of objects to agents, the distributor discovers some of those same
objects in an unauthorized place.
 At this point, the distributor can assess the likelihood that the leaked data came from one
or more agents, as opposed to having been independently gathered by other means.
 Using an analogy with cookies\ stolen from a cookie jar, if it catch Freddie with a single
cookie, he can argue that a friend gave him the cookie.
 But if it catch Freddie with five cookies, it will be much harder for him to argue that his
hands were not in the cookie jar.
 If the distributor sees enough evidence that an agent leaked data, he may stop doing
business with him, or may initiate\ legal proceedings.
 So, he develops a model for assessing the “guilt” of agents.
 It also presents algorithms for distributing objects to agents, in a way that improves
chances of identifying a leaker.
 Finally, it also considers the option of adding fake objects to the distributed set.
 Such objects do not correspond to real entities but appear realistic to the agents.

DATA ALLOCATION PROBLEM



Depending on the type of data requests made by agents and whether
fake objects are allowed.
Fake objects, in some applications, may cause fewer problems that
perturbing real objects.
The use of fake objects is inspired by the use of trace records in mailing
lists.
The distributor creates and adds fake objects to the data that he
distributes to agents.
So fake objects must be created carefully so that agents cannot
distinguish them from real objects.

Algorithm 1: Evaluation of Explicit Data Request
1: Calculate total fake records as sum of fake records allowed.
2: While total fake objects > 0
3: Select agent that will yield the greatest improvement in the
sum objective
4: Create fake record
5: Add this fake record to the agent and also to fake record set.
6: Decrement fake record from total fake record set.










Algorithm 2: Evaluation of Sample Data Request
1: Initialize Min_overlap ← 1,
the minimum out of the maximum relative overlaps that the allocations of different
objects to Ui.
2: The maximum relative Overlap between and any set that the allocation of tk
toUi
3: for all j = 1,..., n : j = i
Calculate absolute overlap & Calculate relative overlap
4: Find maximum relative as
max_rel_ov ← MAX (max_rel_ov, rel_ov)
If max_rel_ov ≤ min_overlap then min_overlap ← max_rel_ovret_k ← k Return
ret_k







CONCLUSION


 Data Leakage is a silent type of Threat.
 The sensitive information can be electronically distributed via e-mail,
Web sites, FTP, instant messaging, spreadsheets, databases, and any
other electronic means available – all without your knowledge.
 To assess the risk of distributing data two things are important :-
1. Data Allocation Strategy that helps to distribute the tuples among
customers with minimum overlap.
2. Calculating guilt probability which is based on overlapping of his data
set with the leaked data set.

REFERENCES

[1] Papadimitriou P, Garcia-Molina H. A Model For Data Leakage Detection// IEEE Transaction On Knowledge
And Data Engineering Jan.2011.
[2] International Journal of Computer Trends and Technology- volume3Issue1- 2012 ISSN:2231- 2803
http://www.internationaljournalssrg.org Data Allocation Strategies for Detecting Data Leakage Srikanth
Yadav, Dr. Y. Eswara rao, V. Shanmukha Rao, R. Vasantha
[3] International Journal of Computer Applications in Engineering Sciences [ISSN: 2231-4946]197 P a g e
Development of Data leakage Detection Using Data Allocation Strategies Rudragouda G Patil Dept of CSE,
The Oxford College of Engg, Bangalore.
[4] P. Buneman, S. Khanna and W.C. Tan. Why and where: A characterization of data provenance. ICDT 2001, 8th
International Conference, London, UK, January4-6, 2001, Proceedings, volume 1973 of Lecture Notes in
Computer Science.
[5] YIN Fan, WANG Yu, WANG Lina, Yu Rongwei A Trustworthiness-Based Distribution Model for Data Leakage
Detection: Wuhan University Journal Of Natural Sciences.
[6] Rudragouda G Patil, “Development of Data leakage Detection Using Data Allocation Strategies International
Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE II, JUNE 2011, [ISSN: 2231-4946].