You are on page 1of 19

Privacy in Data Mining and Securing in Web Services

A Dissertation Synopsis Report


Submitted in partial fulfilment of the requirement for the
award Master of Technology Degree
In
Computer Science & Engineering Discipline

Submitted To

RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA,


BHOPAL (M.P.)

Submitted By:

Monika Binjhade
Enrolment No - 0101CS20MT11

Under The Supervision Of:


Pro. Onkar Thakur
Associate Professor
Department of CSE, TIT (science), Bhopal

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


TECHNOCRATS INSTITUTE OF TECHNOLOGY (SCIENCE), Bhopal
2022

1
TECHNOCRATS INSTITUTE OF TECHNOLOGY (SCIENCE),
BHOPAL (M.P.)
Department of Computer Science & Engineering

CERTIFICATE

This is to certify that the work embodies in this dissertation Synopsis Report
“privacy in data mining and Securing Web Services” being submitted by
“Monika Binjhade”(Roll No.: 0101CS20MT11) for partial fulfilment of the
requirement for the award of “Master of Technology in Computer Science
& Engineering” discipline to “Rajiv Gandhi Proudyogiki Vishwavidyalaya,
Bhopal (M.P.)” during the academic year 2022 is a record of bonafide piece
of work, carried out by him under my supervision and guidance in the
“Department of Computer Science & Engineering”, Technocrats
Institute of Technology(Science), Bhopal (M.P.).

Pro. Rakesh Kumar Tiwari Pro. Onkar Thakur

HOD Supervisor
Department of CSE Associate Professor,
TIT- SCIENCE TIT- SCIENCE

TECHNOCRATS INSTITUTE OF TECHNOLOGY


(SCIENCE), BHOPAL (M.P.)
Department of Computer Science & Engineering

2
CERTIFICATE OF APPROVAL

The Dissertation Synopsis Report entitled “PRIVACY IN DATA


MINING AND SECURING WEB SERVICES” being submitted by “Monika
Binjhade”(Roll No.:0101CS20MT11) has been examined by us and is hereby
approved for the award of degree “Master of Technology in Computer Science
& Engineering Discipline”, for which it has been submitted. It is understood that
by this approval the undersigned do not necessarily endorse or approve any
statement made, opinion expressed or conclusion drawn therein, but approve the
dissertation only for the purpose for which it has been submitted.

(Internal Examiner) (External Examiner)

Date: Date:

TECHNOCRATS INSTITUTE OF TECHNOLOGY (SCIENCE)


BHOPAL (M.P.)

Department of Computer Science & Engineering

3
DECLARATION

I, Monika Binjhade , a student of III rd


Sem, Master of Technology in
Computer Science & Engineering discipline, Technocrats Institute of
Technology (Science), Bhopal (M.P.), hereby declare that the work presented in
this Dissertation Synopsis Report entitled “PRIVACY IN DATA MINING
AND SECURING WEB SERVICES” is the outcome of my own work, is
bonafide and correct to the best of my knowledge and this work has been carried
out taking care of Engineering Ethics. The work presented does not infringe any
patented work and has not been submitted to any other university.

(Monika Binjhade)
Enrollment No. (0101CS20MT11)
Date:

4
ACKNOWLEGEMENT
Practical experience is the best education and an opportunity to apply
theoretical learning and experience the results. Getting associated with an Institute
like T.I.T. SCIENCE, BHOPAL for learning was more than a privilege.
We would take the pleasure by expressing our gratitude towards Pro. Rakesh
Kumar Tiwari (HOD, Computer Science & Engineering) for allowing me to
undertake the Thesis work for successful completion of the study.
We express our sincere thanks to Prof. Onkar Thakur Guide for their
valuable guidance, co-operation, continuously motivates commandments and the
moral support, which was necessary for successful completion of our Synopsis.
We are highly indebted to Dr. SHASHI KUMAR JAIN Director
TIT(SCIENCE), Bhopalfor his kind permission to carry out this Synopsis.
We also express sincere gratitude to the lecturer, professors and lab assistants
of TIT(SCIENCE), Bhopal l for providing helpful study materials and the
database associated with our project.
Last, but not the least, we would like to thanks our loving parents for their
encouragement and co-operation during the time of working through this project.
Also, thanks to all the friends for their encouragement and support.

5
TABLE OF CONTENTS

S.No. TOPIC PAGE No.


1 Introduction 7

2 Literature Survey 9

3 Problem Formulation 11

4 Objective 12

5 Methodology/Planning of Work 13

6 Expected Outcome of Research Work 16

6
INTRODUCTION
It has made the world a smaller place and has opened up previously
inaccessible markets of companies. The internet has brought about a huge change
in the way that business is conducted the world over. On the down side though,
indicate that many things about us, such as out taste in magazines, are finding
their way into databases and are no longer so personal and private as many of us
would prefer them to be. Data mining is part of a technological, social, and
economic revolution that is making the world smaller, more connected, more
service driven, and providing unprecedented levels of prosperity. At the same
time, more information is known, stored and transmitted about us as individuals
than ever. This section will provide a brief overview of some of the privacy issues
and controversies that surround the use of data mining in business today.

PRIVACY
According to Berry and Line off (2000:468) privacy is a complex issue that,
because of technology, is increasingly becoming a social issue. The Cambridge
Advance Learner’s Dictionary (2004), defines the word privacy as someone’s
right to keep their personal matters and relationships secret, Today, every form of
commerce leaves an electronic trail, and acts that were once considered private or
at least quickly forgotten, are stored for future reference.

THE ROLE OF DATA MINING


Most of the uses of data mining are in the area of marketing. Although not
all of the aspects of data mining pose potential treats to individual’s privacy.

THE POTENTIAL THREAT OF DATA MINING TO PRIVACY


The cell phone service provider collects information about all its subscribers
during the sale of a contract. This example highlights an interesting application of
data mining, but it also shows the potential threats that data mining pose to

7
privacy. The main areas of concern with regard to data mining and privacy are
therefore found in the followings:
What kind of information do you collect about your customer?
It is up to the organization employing data mining to ensure that their actions
result in neither of the negative effects, namely, incurring legal liability or
obtaining bas press as a result of privacy violations associated with their data
mining effort (New 2004). Awareness project aimed at applying data mining to
commercial databases for information on potential terrorists, due to a lack of
consideration that was shown for privacy issues.

PRIVACY PRESERVING DATA MINING


Privacy preserving data mining is a novel research direction in data mining
and statistical databases, where data mining algorithms are analysed for the side-
effects, they incur in data privacy. Privacy preserving data mining is still in its
infancy, and whether it will be able to address all the privacy concerns in data
mining is debatable. Inform you customers of potential use of their information
for data mining purposes, and obtain their consent prior to releasing this
information to other organizations.

8
LITERATURE SURVEY
APPROACHES TO PRIVACY IN DATA MINING
To papers entitled “Privacy Preserving Data mining” appeared in 2000.
Although both addressed a similar problem, constructing decision threes from
private training data, the concepts of privacy were quite different. One was based
on data obscuration, i.e., modifying the data values so real values are to disclosed
(Agrawal and Srikant 2000). The other used secure multiparty computation
(SMC) to “encrypt” data values (Lindell and Pinkas 2000), ensuring that no party
learns anything about another’s data values. We first describe SMC, and then give
additional background on data obscuration. We also discuss a problem that has
received little attention: How do we constrain data mining if it is possible that the
result along violate privacy?
• Secure Multiparty Computation
• Obscuring Data
• Perfect Privacy

Secure Multiparty Computation


SMC enable this with the trusted third party.

Obscuring Data
One approach, typically used in census data, is to aggregate items. An
alternative is to add random noise to data values, the mine the distorted data.

Perfect Privacy
One problem with the above is the trade-off between privacy and accuracy
of the data mining results. Here we present a solution based on moderately trusted
third parties – the parties are not trusted with exact data, but trusted only not to
collude with the data receiver.”

9
INDIVIDUAL PRIVACY
“Personal data” shall mean any information relating to an identified or
identifiable natural person (“data subject”); an identifiable person is one who can
be identified, directly or indirectly, in particular by reference to an identification
number or to one or more factors specifies to his physical, physiological, mental,
economic, cultural or social identity and specified that data can be kept in a form
which permits identification of data subjects for no longer than is necessary for
the purposes for which the data were collected or for which they are further
processed.

COLLECTECTION PRIVACY
Protecting individual data items may not be enough- we may need to protect
against learning about subsets of a collection. Such issues are common in a data
warehousing environment, where data from multiple sources is combined from
analysis. Individual privacy concern can lead to corporate privacy concern. The
holder of a collection of individual data may be trusted by those individuals, but
if that data is revealed, this trust is broken. Even if we assume that (1) individual
data items can be disclosed, or are protected by the privacy methods; and (2)
global data mining results do not violate the privacy/secrecy concerns, problems
may still arise. Knowledge about a subset of the combined data set may reveal
secrets of one of the data holders.
As an example, a medical study may want to use data mining to establish
overall trends from hospital data. Even if the techniques used protect patient
privacy, they may reveal hospital- specific information. We can develop efficient
techniques for data mining that protect such study.

10
PROBLEM FORMULATION

Data mining algorithms embody techniques that have sometimes existed for

many years, but have only lately been applied as reliable and scalable tools that

time and again outperform older classical statistical methods. While data mining

is still in its infancy, it is becoming a trend and ubiquitous. Before data mining

develops into a conventional, mature and trusted discipline, many still pending

issues have to be addressed.

11
OBJECTIVES OF THE SUTDY
The primary objective of the study is to conduct a literature review, or
secondary data analysis, of data mining with the intention of gaining a better
understanding of the subject matter. The primary objective will be pursued by
dividing the study into the following secondary objectives:
• Define privacy and gain a basic understanding of what data mining is, and
what is its role is in other related technologies such as business intelligence.
• Provide an overview of some of the more prominent data mining tasks,
techniques and algorithms.
• Identify and suggest a process for conducting data mining.
• Identify some typical uses of applications of data mining in general and in
specific industries.
• Discuss typical issues facing organizations that with to employ data mining
as a tool in their business.
• Identify some of the potential societal issues relating to data mining and its
implementation.

12
METHODOLOGY
The study will be conducted by making use of qualitative research. The
objectives of the study will be pursued by sing a literature review or secondary
data analysis. Articles, textbooks, research reports, dissertations, the internet and
other scientific publications relevant to data mining will be used. The viewpoint
of different authors will be compared and evaluate.

The goal of any data mining effort can be divided in one of the following
two types (Cha & Lweis, 2002:57).
• Using data mining to generate descriptive models to solve problems.
• Using data mining to generate predictive models to solve problems.
The descriptive data mining tasks characterize the general properties of the
data in the database, while predictive data mining tasks perform inference o the
current data in order to make prediction. Descriptive data mining focus on finding
patterns describing the data that can be interpreted by humans, and produces new,
nontrivial information based on the available data set. Predictive data mining
involves using some variables or fields in the data set to predict unknown or future
values of other variables of interest, and produces the model of the system
described by the given data set.
The goal of predictive data mining is to produce a model that can be used to
perform tasks such as classification, prediction or estimation, while the goal of
descriptive data mining is to gain an understanding of the analyzed system by
uncovering patterns and relationships in large data sets. The goal of a descriptive
data mining model is therefore to discover patterns in the data and to understand
the relationships between attributes represented by the data, while the goal of a
predictive data mining model is to predict the future outcomes based on passed
records with known answers. Further divide the data mining task of generating
models into the following two approaches:
• Supervised or directed data mining modelling.

13
• Unsupervised or undirected data mining modelling.
The goal in supervised or directed data mining is to use the available data to
build a model that describes one particular variable of interest in terms of the rest
of the available data. The task is to explain the values of some particular field.
The user selects the target field and directs the computer to determine how to
estimate, classify or predict its value.
In unsupervised or undirected data mining however variable is singled out as
the target. The goals of predictive and descriptive data mining are achieved by
using specific data mining techniques that fall within certain primary data mining
tasks. The goal is rather to establish some relationship among all the variables in
the data. The user asks the computer to identify patterns in the data that may be
significant. Undirected modelling is used to explain those patters and relationships
one they have been found.

CLASSIFICATION/PREDICTION
Classification involves the discovery of a predictive learning function that
classifies a data item into one of several predefines classes. It involves examining
the features of a newly presented object and assigning to it a predefined class.
Define classification has a two-step process. First a model is built describing a
predetermined set of data classes or concepts and secondly, the model is used for
classification.
Prediction can be viewed as the construction and use of a model to assess the
class of a unlabelled sample, or to assess the value or value range of an attribute
that a given sample is likely to have.

ESTIMATION
Estimation is grouped under predictive data mining tasks.

14
SEGMENTATION
Segmentation is grouped under descriptive data mining tasks.

CLUSTERING
Clustering is grouped under descriptive data mining tasks. One of the most
powerful forms of descriptive data mining is data visualization.

LIMITATIONS OF THE STUDY


The study will focus mainly on privacy and data mining, but no data mining
discussion will be complete without at least mentioning some of its related
information technologies. The study will not attempt to make data mining experts
out of the researcher or the reader, but will focus more on providing a basic
understanding of the technology, and its potential effects on how business is
conducted.

15
EXPECTED OUTCOMES OF RESEARCH WORK
Defines the work privacy as someone’s right to keep their personal matters
and relationships secret. Today, every from of commerce leaves and electronic
trail, and acts that were once considered private or at least quickly forgotten, are
stored for future reference. The world over complain that they are overwhelmed
by the amount of data available to them, but that they are unable to make any
sense of this data. The changing business environment and the fact that customers
are becoming more and more demanding highlight the need for organizations to
be able to adapt faster and more effectively to those changes.
Privacy violation may occur legal liability that could result in expensive law
suits. Data mining developed as a direct result of the natural evolution of
information technology. The increased organizational use of Computer based
system has resulted in the accumulation of vast amounts of data, and the need for
decision makers to have efficient access to knowledge, and not only data, has
resulted in more and more organizations adopting the use of data mining. Privacy
violations may result in bad press that can do considerable damage to corporate
or brand image.
Privacy in data mining is a novel research direction in data mining and
statistical databases, where data mining algorithms are analyzed for the side-
effects they incur in data privacy. The main objective or privacy in data mining is
to develop algorithms for modifying the original data in some way, so that the
private data and private knowledge remain private even after the mining process.
The promise of data mining is to return the focus of large, impersonal
organizations to serving their customers better and to providing more efficient
business processes. Indeed, for some organizations data mining offers the
potential for gaining a competitive advantage, but for others it has become a
matter of survival.
On thing that information technology experts and business professionals
must realize is that following ethical practices and respecting the privacy of

16
individuals makes good business sense. The bad publicity associated with a single
incident can taint and organization’s reputation for years, even when the
organization has followed the law and done everyone that it perceives possible to
ensure the privacy of those from whom the data was collected. The literature is
filled with examples of the successful application of data mining, not only to
specific business functions, but also in specific industries. Undoubtedly, certain
industries, such as those dealing with huge amounts of data, and those exposed to
many diverse customers, stand to benefit more from data mining than others.
The benefits associated with data mining, for organization, individuals and
society as a whole, far exceed its drawbacks, but the biggest issue facing
organizations that want to employ data mining, is its cost. The other drawbacks
of data mining relate to the threat that it poses to privacy, and any data mining
efforts mush not only be done within the framework of the relevant laws, but must
also be done in an ethical manner.
Although data mining is probably beyond the financial ability of most
organizations, its main principle, the fact that there might be value in
organizational data, should not be forgotten. Organizations much Endeavour to
treat their data with the same respect it has for all its other corporate assets.
As more and more business organizations adopt Web services, ensuring
secure communication between communicating partners is becoming even more
important. For Web services environments, security is becoming even more
important due to the Web services' unique characteristics. In the dissertation, we
had discussed the Web services' characteristics, technologies & standards;
however, the focus had been on the Web services' security.
In the dissertation, there have been two main parts, technologies study and
implementation. Under the technologies study, we had been introducing the main
ideas & concepts of the core Web services' technologies, explain why SSL falls
short when it comes to Web services and XML-based Web ser vices security
schemes. And we had been also discussing the importance of the federated
identity. Then we will implement the Web services security under the
17
technologies of Web Sphere and J2EE by deploying a Web service on a case of
e-marketplace so to show how these technologies might be used together in future
work.
We have presented how web mining (in a broad sense, DM applied to
ecommerce) is applicable to improving the services provided bye-commerce
based enter-prizes. Specifically, we first discussed some popular tools and
techniques used in data mining. Statistics, AI and database methods were
surveyed and their relevance to DM in general was discussed. We then presented
a host of applications of these tools to DM in e-commerce. Later, we also
highlighted architectural and implementation issues.
We now present some ways in which web mining can be extended for
further research. With the growing interest in the notion of semantic web, an
increasing number of sites use structured semantics and domain ontologies as part
of the site design, creation, and content delivery. The notion of Semantic Web
Mining was introduced by Berendt et al (2002). The primary challenge for the
next-generation of personalization systems is to effectively integrate semantic
knowledge from domain ontology into the various parts of the process, including
the data preparation, pattern discovery, and recommendation phases. Such a
process must involve some or all of the following tasks and activities.

(1) Ontology learning, extraction, and preprocessing: Given a page in the web
site, we must be able extract domainlevel structured objects as semantic entities
contained within this page.

(2) Semantic data mining: In the pattern discovery phase, data mining algorithms
must be able to deal with complex semantic objects.

(3) Domain-level aggregation and representation: Given a set of structured objects


representing a discovered pattern, we must then be able to create an aggregated

18
representation as a set of pseudo objects, each characterizing objects of different
types occurring commonly across the user sessions.

(4) Ontology-based recommendations: Finally, the recommendation process must


also incorporate semantic knowledge from the domain Ontologisms.

19

You might also like