You are on page 1of 3

Privacy Preserving Data Mining Using LBG And ELBG


1,3 Department of Electronics and Computer Engineering, Associate professors ,CSI Life Member K.L.University,
Vaddeswaram, Guntur
2,3,4 III/IV B.Tech ECM K.L.University, Vaddeswaram, Guntur

Abstract: Many privacy preserving data mining algorithms attempt to preserve what database owners consider as sensitive.
This Paper Provides one Technique for Privacy Preserving Data Mining as Data Mining predicts high sensitive information.
We will show reconstruction based technique for numerical data. Finally the performance of the proposed techniques is
evaluated using accuracy and distortion parameters.

Keywords: Privacy preserving, reconstruction, quantization.

I. INTRODUCTION Fourth dimension refers to Hiding, part of data or
result of data mining ,.. can be hided.
Huge volumes of detailed personal data are regularly
collected and analyzed by applications. Such Data Fifth dimension is the most important issue i.e,
include shopping habits, criminal records, medical providing privacy during data mining .This paper
history, credit records, among others. Analyzing such mainly concentrates on fifth dimension
data opens new threats to privacy .Privacy preserving
data mining (PPDM) is one of the important area of III. PROPOSED APPROACH
data mining that aims to provide security for secret
information from unsolicited or unsanctioned Vector quantization works based on rounding off.
disclosure. Data mining techniques analyzes and 1. Original Data
predicts useful information. The concept of privacy
preserving data mining is primarily concerned with 2. Constructing codebook
protecting secret data against unsolicited access. It is
3. Encoding original data with code book
important because now a day’s Treat to privacy is
becoming real since data mining techniques are able 4. Decoding
to predict high sensitive knowledge from huge
volumes of data [1]. Vector quantization (VQ) is generally used for data
compression. In previous days, the design
II. CLASSIFICATION OF PRIVACY methodology of a vector quantizer (VQ) is treated as
PRESERVING DATA MINING a big problem in terms of the need for multi-
dimensional integration. Linde, Buzo, and Gray
There are many approaches for privacy preserving (LBG) Introduced an algorithm for Vector
data mining. Privacy preserving data mining quantization design based on training sequence. A
techniques can be classified based on the following VQ that is designed based on this algorithm are
dimensions [11]. referred as LBG-VQ.

1. Data distribution
2. Data modification
3. Data mining algorithm
4. Data or rule hiding
5. Privacy preservation
First dimension refers to data distribution; data can be
distributed vertically or horizontally over the systems,
Second dimension refers to modifying the original
data to other form, so that we can prevent de-
identification of sensitive data,
There are several methods are there for data
modification like randomization, swapping, sampling,
anonymity, blocking,…etc
Third dimension is Data mining algorithm, when
mining is performed on data we could be able to
preserve privacy of individuals. Figure 3.1 : Block diagram of Vector Quantizer

Proceedings of 6th IACEECE-2013, 29th September 2013, Chennai, India. ISBN: 978-93-82702-31-3


Double the size of the codebook by splitting each current codebook according to the rule 3. 1980][12]. Because quantization performed with the help of indices of codebook. Nearest-Neighbor Search: for each training vector. Design a 1-vector codebook. 3 and 4 until a codebook size of M is designed. So. The algorithm is formally implemented by the following recursive procedure 1. 2. ISBN: 978-93-82702-31-3 49 . and assign that vector to the corresponding cell (associated with the closest codeword). New dataset will be formed by approximating each point of data with nearest code vector in the code book. 5. once the code book is constructed. here code book plays a very important role and accuracy of data mining result depends on the effective design of code book. namely LBG algorithm [Linde. new data set contains approximated data values. 4. There is a well-know algorithm. This Thesis uses LBG algorithm for designing of codebook. Figure 3. this is the centroid of the entire set of training vectors (hence. India. Iteration 2: repeat steps 2. Buzo and Gray. Centroid Update: update the codeword in each cell using the centroid of the training vectors assigned to that cell.11 : Representation of original data set and sanitized Section 4 explains about basics of vector data set using LBG quantization. not exact values like original data set IV. EXPERIMENTAL RESULTS Table1: Original Data Proceedings of 6th IACEECE-2013. for clustering a set of L training vectors into a set of M codebook vectors. 29th September 2013. find the codeword in the current codebook that is closest (in terms of similarity measurement). Chennai. no iteration is required here). Iteration 1: repeat steps 3 and 4 until the average distance falls below a preset threshold 6. Privacy Preserving Data Mining Using LBG And ELBG CodeBook Generation Using LBG For the generation of the codebook some set of training data (LSF) is needed which is obtained by collecting a set of LSF vectors.

Information and systems science and engineering . M. 29th September 2013. November 2004. Pattern Recognition Letters. 4. vol. No. 1807-1809.R. Agarwal Charu C. M. C. “Codebook Generation for Vector Quantization with Edge Features”.Siddaiah .ics.12 : Clustering on original data set and sanitized Technique for Generalized Lloyd Algorithm using Cluster dataset using LBG Density”. Y. Tsai. Irvine. New York. California 95120. http://en. CONCLUSION AND FUTURE SCOPE 10. New York. International journal of Computer . Chennai. Privacy Preserving Data method can be used rather than LBG approach that Mining: Models and Algorithms..wikipedia. M. & Srikant. Elmagarmid. UK. 2. Data mining. 2008. 1999 Proceedings of 6th IACEECE-2013. and C.http://archive. Performance is also evaluated by taking into account Vol 2 No:1 two important parameter: distortion and Fmeasure 13. Wikipedia. Vol. San Jose. 2009} 9. Privacy Preserving Data Mining... Vol. pp. 194-198.Vimala. 30. No.M. M. Multi possible using vector quantization approach. W.UCI Repository of machine learning databases. pp.Madhavi Latha.. In Proceedings of the International Workshop on Privacy and Security Aspects of Data Mining in conjunction with ICDM 2004. S.. International Journal on Computer Science and Engineering. 2008.Satya Sai Ram and P.Vassilios S. Elisa Bertino. “A Novel Codebook Initialization Figure 7. Tyrone Grandison Privacy Preserving Data Mining. 1. Ibrahim. This the-art in Privacy Preserving Data Mining in SIGMOD work shows analytically and experimentally that Record. R. S. Privacy Preserving Data Mining Using LBG And ELBG REFERENCES 1. 6. we have used.M. ISBN: 978-93-82702-31-3 50 .Siddaiah\Multi (quality of data mining results). 2.7.Proceedings of World Academy of Science . Oliveira S. Yang. In Proc. K. (2000). Chiang. M. 2010.Agarwal Charu C. A Fast VQ Codebook Generation Algorithm via Pattern Reduction.Vimala. Brighton..uci. 2. IBM Almaden Research Center 650 Harry Road. Privacy Preserving Data Mining: Models and Algorithms. Engineering and Technology Volume 27 Feb 2008 ISSN 1307-6884.Satya Sai Ram and P. Yu Philip S. of ACM SIGMOD Conference on Management of Data (SIGMOD’00).Somasundaram. 33. 5. Switched Split Vector Quantization.. Switched Split Vector Quantizer. USA 3.Alexandre Evfimievski. India. Zaiane Osmar R. V. 8. No. R. Springer. Yu Philip S. Dallas.Madhavi Latha. TX. CiiT International Journal This Work gives a different approach of using vector of Digital Image Processing. E. Workshop on Knowledge and Data Engineering Exchange. Igor Nai Fovino State-of- quantization for privacy preserving data mining. the approach which can give better result. pp. Vol. University of California. March 2004. 11. 653{660. Binit kumar Sinha “Privacy preserving clustering in data mining”. Bertino. K nearest neighbor approach is one of 15. Privacy-Preserving data mining is to some extent 12. A.Agrawal. Verykios. Atallah.: Disclosure limitation of sensitive rules.A Privacy-Preserving Clustering Approach Toward Secure and Effective Data Analysis for Business Collaboration.Somasundaram. 5. As future work new and effective quantization 14. S.K. M. C.. Lee. Springer. Verykios. C..