International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 , Impact Factor 2.379 ISRA:JIF

Attribution Non-Commercial (BY-NC)

3 views

International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 , Impact Factor 2.379 ISRA:JIF

Attribution Non-Commercial (BY-NC)

- Analyze Different Approaches for IDS Using KDD 99 Data Set
- Everitt Cluster Analysis PDF
- Curves
- datamining
- mathgen-908670162
- An Empirical Study of the Applications of Classification Techniques in Students Database
- A Survey of Entity Resolution and Record Linkage Methodologies
- ijcttjournal-v1i2p5
- Geom
- Data Mining on Heathcare Management System
- 05518596
- Lecture 1
- New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhance Performance
- Vision 7CameraModelA S09
- Business Analytics and Data Mining for Management Institutes
- 68020_assignment 11 Rotational Motion
- Bi-Directional Prefix Preserving Closed Extension For Linear Time Closed Pattern Mining
- tensor in su(n)
- A Threshold fuzzy entropy based feature selection method applied in various benchmark datasets using Ant-miner algorithm
- Plane Transform

You are on page 1of 5

Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847

Abizer1 and Hitesh Gupta 2

1

M. Tech scholar Patel Institute of Technologys 2 Asst. Professor Patel Institute of Technology

Abstract

Data mining has become very important in last few years. Its main purpose is to drawing out interesting pattern from huge data set. With increasing application area of data mining, e.g. military, wheatear forecasting, market analyses, heath etc, the importance of data mining is getting increased day by day. Due to the same there is issue of privacy breaches of individual datum. There are different traditional and modern methods for securing the data. Each method is having its own trade off among various factors like complexity of an algorithm, security level of data etc. This paper talks about a method which takes the reference of geometry based mathematical model. This algorithm not only provides the data security but also different level of security for the same data.

1. INTRODUCTION

With the involvement of computer, the analysis of identifying different diseases and their trends in the locality became very much effective. Sharing of patients information is increasing day-by-day which is majorly private in nature. Because of most of the time the process of data mining is done by third party that is also known as centralized third party. To extract pattern out of available information, data mining procedure is used [1]. As we know any personal data require privacy. Securing this data is not a trivial issue. This can be deal by privacy preserving in data mining [2]. Here data is made secure and it provides higher mining result as before. This paper descries a perturbation approach for securing data i.e spatial transformation. Here we have a centralized body which will collect data and perform mining practice. All data sources first perturb the data by moving data in a space so that original identity is getting hided but it ensures that mining result will not be get affected. Structure of the paper is as fallows. Section 2 discusses different techniques for securing data by altering its value. Section 3 gives preliminary knowledge about the method. Section 4 describes method with pseudo code. Section 5 mentions results obtained. Last section talks about conclusion.

2. LITERATURE SURVEY

There exists a growing body of literature on privacy sensitive data mining. A conventional method for securing data is to use cryptographic protocol [3]. Data is shared as horizontal partitioning where different sets records having same attributes. And some time data is vertically shared that is different locations have different attributes. Dimensionality Reduction-Based Transformation (DRBT) [4] another approach, is based on random projection. Here data is protected by replacing with projected value. This method uses strong mathematical base, without involving CPU much. This method cannot be used with distance based clustering for good result and there is cost of exchanging information is also linked. It is applicable over centralized and vertical partition data. Some methods use data distortion by probability distribution [5], where data is replaced by distorted data which is obtained as fallows. Firstly density function is determined for given series and density parameter is calculated. Then series is generated by density function parameter finally this generated series replaced original series. Bayes estimation [6] and PCA based method are also used to generate random value which is used to distort data. This distortion of data can be done either by adding or by multiplying noise to the data records. Data can also be secured by the following minimal perturbation method. This method is based on minimal modifying of data in database. This modification could be done by removing or/and adding some data. Through this modification we can limit the support and confidence such that the minimal information can be retrieved which is the main purpose of our data mining work. This is the way of perturbing database [7, 8, 9]. K-anonymity [10, 11, 12] is another approach to alter the data so it can be protected. This method is based on that some attribute of data can be used to identify data. In this data is so changed that to identify data uniquely one need minimally (k-1) records. K-anonymity can be used to protect identity but fails to protect attribute disclosure. This limitation is address by diversity [13] and t-closeness [14]. t- Closeness uses Earth Mover Distance metric [15] to find out distance between attribute. Matrix factorization resolves accuracy issue in privacy preservation. It generate simplified low rank approximation version to the original dataset. By this it provide high level accuracy and accurate mining result. Matrix Factorization

Page 390

Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847

Techniques can be classified as: - Singular Value Decomposition (SVD) - Nonnegative Matrix Factorization (NMF) - NMF-based Data Distortion Method

3. METHOD PRELIMINARY

As in Previous section discus, data can be secured by altering its value i.e replacing data with some other value. This motivates to implement geometrical transformation for data securing purpose. This section brief about geometrical transformation and how it resolve security issue and improve data mining result. 3.1 Transformation Transformation is to move geometry around space this can be achieved by reflection. This reflection is done against a line, it can be an axis or any general line [16]. Multiple reflection cause rotation or translation depending on the alignment of axis on which reflection is taking place if they are in parallel then it is translation. Rotation if intersect. Mathematical way of performing Transformation is given by Euclidean know as Euclidean transformation. Two methods given under Euclidean transformation are rotation and translation [17]. Mathematically Translations and Rotations on the xy-Plane is represented bellow. Translation is moving a point in x y plan by adding vector [, ]. It can be explain as x=x+ ; y=y+ ; this can be achieved by matrix representation of coordinate geometry. (x,y) can be represented as:Then, the relationship between (x, y) and (x', y') can be put into a matrix form like the following:

If a point (x, y) is rotated an angle ( = ) about the coordinate origin to become a new point (x', y'), the relationships can be described as follows:

.(x1,y1) .(x2,y2)

.(x1,y1) .(x2,y2)

3.2 Level of security Security is achieved by rotating data around z axis. Actual data is replaced by rotated data. Degree of security is increased by rotating at different angel randomly chosen. For example, the distances between original data and perturbed when rotated at an angle more than 450 (e.g 750 ) data is more, while it is rotated at an angle of 450. This will increase the level of security. This security again can be increases with normalization of data. 3.3 Procedure Two numerical attribute are selected from given data set and are arranged in a matrix. Attribute are placed in place of X and Y i.e in Colum 1 and Colum 2 while Colum 3 is given value of 1. This above matrix is than multiply by rotational matrix. 3.4 General Assumptions This approach to distort data points in the n-dimensional space draws the following assumptions: The data matrix D contains confidential and non-confidential numerical attributes that must be perturbed to protect individual data values before classification. The existence of an Identification of object may be revealed but it could be also anonymized by suppression. However, the values of the attributes associated with an object are private and must be protected. The perturbation when applied to a data matrix D must preserve the classes of different datasets.

Page 391

Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847

We also assume that the raw data is pre-processed as follows: Suppressing Identifiers. The existence of a particular object, say ID, could be suppressed/revealed depending on the application, but it could be suppressed when data is made public (e.g. census, social benefits). The major steps of the data perturbation, before classifying or clustering analysis, are depicted in Figure 2. In the first step, the raw data is normalized to give all the variables an equal weight. Then, the data are distorted by using our Shared Secret Preserving for Data Mining (SSPDM) method. In doing so, the underlying data values would be protected, and miners would be able to cluster the transformed data.

4. METHOD

The procedure to distort the attributes of a data matrix has essentially 2 major steps, as follows: Step 1. Selecting the attribute pairs: We select k pairs of attributes Ai and Aj in D, where i!= j. If the number of attributes n in D is even, then k = n/2. Otherwise, k = (n + 1)/2. The pairs are not selected sequentially. A security administrator could select the pairs of attributes in any order of his choice. If n is odd, the last attribute selected is distorted along with any other attribute already distorted. We could try all the possible combinations of attribute pairs to maximize the variance between the original and the distorted attributes. However, given that we distort attributes, the variance of any attribute pairs tends to lie in the same range. Step 2. Distorting the attribute pairs: The pairs of attributes selected previously are distorted as follows: (a) Computing the distorted attribute pairs as a function of : We compute V (AI ,Aj) = R V(Ai ,Aj) as a function of , where R is the rotation matrix. (b) Meeting the pair wise-security threshold: We derive two in equations for each attribute pair based on the constraints: Variance (Ai -Ai ) = 1 and Variance (AjAj) = 2, with 1 > 0 and 2 > 0. (c) Choosing the proper value for: Based on the in equations found previously, we identify a range for that satisfies the pair wise-security threshold PSTR (1, 2).We refer to such a range as security range of rotation. Then, we randomly select a real number in this range and assign it to (d) Outputting the distorted attribute pairs: Given that are already determined, we now computed V (Ai,Aj) = R V(Ai ,Aj), and output the distorted attribute pairs. Each equation in sub-step (b) is solved by computing the variance of the matrix subtraction [Ai Ai ]. We know that the sample variance of N values x1, x2, ... , xN is calculated by: i=1 Var(x1, x2, ..., xN) = 1 ( xi x )2 N Where x is the arithmetic mean of the values x1, x2, .... , xN

Page 392

Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847

The inputs for the SSPDM algorithm are a normalized data matrix D and a set of k pair wise-security thresholds Tk. we assume that there are k pairs of attributes to be distorted. The output is the transformed data matrix D, which is shared for clustering analysis. The sketch of the SSPDM algorithm is given as follows in figure 3.

5. RESULT

We have implemented SSPDM algorithm by using MATLAB 7.7 and performed experiments. We have taken these datasets from machine learning repository of University California Irvine (UCI). We have taken two real world dataset from this repository and performed our experiments. Iris dataset is considered as dataset 1. Dataset 1 having 150 non repeating dataset records of 4 attribute named 1. Sepal length 2. Sepal width 3. Petal length 4. Petal width. Out of these four attributes, two attributes sepal length and sepal width are taken for implementation. Dermatology dataset is considered as Dataset 2. Dataset 2 having 366 data records from which 358 non repeating non missing dataset records of 33 attributes are taken. Out of these two attribute are taken for implementation. This paper has considered three different numbers of clusters for both considered datasets. cluster2 represents that dataset is divided into two clusters, similarly cluster3 for dividing dataset into three clusters and cluster4 for dividing data set into four clusters. Table 1 Iris dataset Iris Perturb Original cluster2 0.458 0.333 cluster3 1 1 cluster4 0.25 0 cluster5 0.20 0.04

Table 2 Dermatology dataset Dermatology Perturb Original cluster2 1 1 cluster3 0.54 0.41 cluster4 0.125 0.25 cluster5 0.08 0.08

Table 1, represent result of clustering on iris data set. The analysis of table 1 shows accuracy of clustering results remains the same in-fact it has been improved in some cases. For example in cluster 3 accuracy of the result remain same but for cluster 2 and cluster 4 it has been improved. Table 2, represent result of clustering on dermatology data set. The analysis of table 1 shows accuracy of clustering results remains the same in-fact it has been improved in some cases. For example cluster3 and very few time bit low as in case of cluster4. These results shows that SSPDM not only provide security but also keep data mining results are remain intact i.e keep data structure same.

6. Conclusions

This paper talks about securing data in data mining. This method talks about way of altering this data. This method is not destructing the correlation among the data. Data are still have the same structure as before. The only difference is their positions in reference coordinate axis. In other words we can say that the method provides security to the data by moving it at different location in space but retaining the structure or corresponding relation. This paper also talks about degree of privacy, which is responsible to provide different level of security to the data. When the difference between original data and altered data is more, It means it is hard for any intruder to find the original data by simple guessing and when it is less then it is easier for the intruder. This level of privacy will entirely depend on mutual understanding of all parties who are participating in a centralized data mining. The result, section 5, shows this thing that this method is independent of these levels of alteration. At any level of privacy, the result is same. This method provides almost same results as it was with original data by maintaining the secrecy of medical data of any patient. And by using this method we can provide different level of privacy, too, at the same time.

References

[1] Thuraisingham, Bhavani. Data Mining for Malicious Code Detection and Security Applications. IEEE Volume: 2, Page(s): 6 7, 2009. [2] Malik M.B, Ghazi, M.A and Ali, R. Privacy Preserving Data Mining Techniques: Current Scenario and Future. IEEE conference on Computer and Communication Technology (ICCCT), 2012.

Page 393

Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847

[3] Cryptographic techniques for privacy-preserving data mining Benny Pinkas HP Labs, Volume 4, Issue 2 - page 12, SIGKDD Explorations [4] D. Achlioptas. Database-Friendly Random Projections. In Proc. of the 20th ACM Symposium on Principles of Database Systems, pages 274281, Santa Barbara, CA, USA, May 2001. [5] J A Data Distortion by Probability Distribution CHONG K. LIEW, UINAM J. CHOI, AND CHUNG J. LIEW University of Oklahoma. [6] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag, New York, 1994. [7] M Association Rule Hiding Vassilios S. Verykios, Member, IEEE, Ahmed K. Elmagarmid, Senior Member, IEEE, Elisa Bertino, Fellow, IEEE, Yucel Saygin, Member, IEEE, and Elena Dasseni 434 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 4, APRIL 2004. [8] R. Agrawal and R. Srikant, Privacy Preserving Data Mining, Proc. ACM SIGMOD Conf., 2000. [9] D. Agrawal and C.C. Aggarwal, On the Design and Quantification of Privacy Preserving Data Mining Algorithms, Proc. ACM PODS Conf., 2001. [10] P. Samarati. Protecting respondents privacy in microdata release. IEEE T. Knowl. Data En, 13(6):10101027, 2001. [11] P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory, 1998. [12] L. Sweeney. K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzz. 10(5):557570, 2002. [13] A. Machanavajjhala, J. Gehrke, D. Kifer, and M.Venkitasubramaniam. -diversity: Privacy beyond k-anonymity. In Proc. 22nd Intnl. Conf. Data Engg. (ICDE), page 24, 2006. [14] t-Closeness: Privacy Beyond k-Anonymity and _Diversity Ninghui Li Tiancheng Li Department of Computer Science, Purdue University {ninghui, li83}@cs.purdue.edu Suresh Venkatasubramanian AT&T Labs Research suresh@research.att.com. [15] Y. Rubner, C. Tomasi, and L. J. Guibas. The earth movers distance as a metric for image retrieval. Int. J. Compute. Vision, 40(2):99121, 2000. [16] http://en.wikipedia.org/wiki/Transformation_geometry [17] http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html

Page 394

- Analyze Different Approaches for IDS Using KDD 99 Data SetUploaded byEditor IJRITCC
- Everitt Cluster Analysis PDFUploaded byJeremey
- CurvesUploaded byspad22
- dataminingUploaded byDeepthi Tarapatla
- mathgen-908670162Uploaded bycrestind
- An Empirical Study of the Applications of Classification Techniques in Students DatabaseUploaded byAnonymous 7VPPkWS8O
- A Survey of Entity Resolution and Record Linkage MethodologiesUploaded byjames.hannigan
- ijcttjournal-v1i2p5Uploaded bysurendiran123
- GeomUploaded byjumperpc3000
- Data Mining on Heathcare Management SystemUploaded byInternational Journal for Scientific Research and Development - IJSRD
- 05518596Uploaded bySyifaul L Ahmad
- Lecture 1Uploaded byMungThoth
- New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhance PerformanceUploaded byInternational Journal of computational Engineering research (IJCER)
- Vision 7CameraModelA S09Uploaded byKartik Dutta
- Business Analytics and Data Mining for Management InstitutesUploaded bysingh_devanshu
- 68020_assignment 11 Rotational MotionUploaded byMrinal Tripathi
- Bi-Directional Prefix Preserving Closed Extension For Linear Time Closed Pattern MiningUploaded byseventhsensegroup
- tensor in su(n)Uploaded byManu Sharma
- A Threshold fuzzy entropy based feature selection method applied in various benchmark datasets using Ant-miner algorithmUploaded byIJMER
- Plane TransformUploaded byMich Ladycan
- MTC 121Uploaded byRica Alhambra
- Resume_SampleUploaded bySolaiman Salvi
- Engineering journal ; Discovery of fraud in Medical InsuranceUploaded byEngineering Journal
- Data Mining Application Design Using K-MEANS and Exponential Smoothing Algorithm for Predicting New Student RegistrationUploaded byIRJCS-INTERNATIONAL RESEARCH JOURNAL OF COMPUTER SCIENCE
- CurvesUploaded byCrystal May Oqueriza
- determinationUploaded byDaniel
- [Shun-ichi Amari (Auth.)] Information Geometry and(B-ok.org)Uploaded byYasir Nawaz
- amrieh2015Uploaded byMohamed Boussakssou
- A Taxonomy and Classification of Data MiningUploaded byMartin Griffin
- - CHART Moments of Inertia 3D Awrejcewicz J 2012 01Uploaded byKaren Rincón Ortiz

- A Review on Artificial Intelligence in Education for Learning Disabled ChildrenUploaded byAnonymous vQrJlEN
- PERFORMANCE EVALUATION OF SHUNT WOUND DC MOTOR IN PROCESS CONTROLUploaded byAnonymous vQrJlEN
- C.Ri.S.P. - Cyber Risk Analysis in Industrial Process System EnvironmentUploaded byAnonymous vQrJlEN
- Heart Disease Prediction Using Hybrid ClassifierUploaded byAnonymous vQrJlEN
- IRIS SCAN BASED EMERGENCY SYSTEMUploaded byAnonymous vQrJlEN
- IJAIEM-2019-06-30-13.pdfUploaded byAnonymous vQrJlEN
- Prediction of Cyberbulling Using Random Forest ClassifierUploaded byAnonymous vQrJlEN
- Proficient Technique for Classification of ECG SignalUploaded byAnonymous vQrJlEN
- Performance Evaluation of No-frill AirlinesUploaded byAnonymous vQrJlEN
- An empirical study of Factors contributing to the adoption of Free Desktop Open Source Software in Africa: A case of Kenya university studentsUploaded byAnonymous vQrJlEN
- Effects Of Cybercrime In IndiaUploaded byAnonymous vQrJlEN
- Impact of Slurry Temperature on Biogas Production from Chicken Waste and Food Wastes by Anaerobic Digestion MethodUploaded byAnonymous vQrJlEN
- CONSUMPTION OF SANITARY NAPKINS&EFFECT ON ENVIRONMENTALS USTAINABILITYUploaded byAnonymous vQrJlEN
- Classification of Power Quality Disturbances Using Wavelet Transform and Halfing AlgorithmUploaded byAnonymous vQrJlEN
- FACTORS INFLUENCING ONLINE SHOPPING –AN EXPLORATORY REVIEW IN THE INDIAN PERSPECTIVEUploaded byAnonymous vQrJlEN
- Socio-economic Conditions of the Agricultural Labourers: An AnalysisUploaded byAnonymous vQrJlEN
- Analysis two level of real time load flow 132/33/11kv sub-station using ETAP softwareUploaded byAnonymous vQrJlEN
- Music as Persuasive Communication StrategyinAdvertising and BrandingUploaded byAnonymous vQrJlEN
- IJAIEM-2019-06-25-9.pdfUploaded byAnonymous vQrJlEN
- Study of Smoke Levels Using crude Jatropha Oil with Preheated Condition and at Normal Condition Injection in High Grade Low Heat Rejection Combustion Chamber of Diesel Engine at Different Operating PressuresUploaded byAnonymous vQrJlEN
- Digital Literacy and Academic Performance at Higher Education: Effects and Challenges Authors: Dr. Jyotsana ThakurUploaded byAnonymous vQrJlEN
- An Approach to Study Vibration Phenomena Using Field Data Based Mathematical Modeling in Spinning MachineUploaded byAnonymous vQrJlEN
- Prediction of Personality through Hand Geometry and Big Five Factor ModelUploaded byAnonymous vQrJlEN
- Development of Naval Alloy by Casting ProcessUploaded byAnonymous vQrJlEN
- AUTOMATIC WATER FLOW REGULATORUploaded byAnonymous vQrJlEN
- 3D PRINTED INTEGRATED FULLFUNCTION PROSTHETIC HAND-ARM SYSTEMUploaded byAnonymous vQrJlEN
- Measuring effectiveness of TBRP method for different capacity Data centersUploaded byAnonymous vQrJlEN
- Offline Signature Verification using LBP TechniqueUploaded byAnonymous vQrJlEN

- Rock quality indexUploaded byre_alvaro
- Installing or Uninstalling MINI-LINK CraftUploaded byRaziel Velazquez
- Bohling, Christopher_Review On_Helmut R. Wagner, Alfred Schutz an Intellectual BiographyUploaded byFernando Campos
- Smart Quill2Uploaded bysuchu138522
- 112285559-C4-Sieve-Analysis-Aggregate-Grading-for-Fine-and-Coarse-Aggregate.docUploaded byKyle Moolman
- Organizational Stress or sUploaded bymac76
- ATMega16 AVR Microcontroller LCD Digital ClockUploaded byAdrianMartinezMendez
- Robert Triner ResumeUploaded byprosoundguru
- SAP NetWeaver as ABAP Release 702Uploaded byKishore Reddy
- Early Binding And1 Late BindingUploaded bySrinath
- Meteorology JeppesenUploaded byАртём Огородиков
- lectura1Uploaded byRodrigo Alarcón Vásquez
- STL for Large Data ManagementUploaded bymanoj-kumar-jain
- Assignment - CSM InterviewUploaded bysang1992
- Caso-Amazon Ad2018Uploaded byCarlos Leyva
- Capillary Gc - Columna 5 Metil Fenil SiliconaUploaded bydeliana07
- Write Up InstrumentalUploaded byAgrippa Mungazi
- How to Write a Scientific ReportUploaded byNobody Owens
- Blackpowder RiflesUploaded byRichard Harig
- Busbar Size and CalculationUploaded byandro max
- The Celestial SphereUploaded bymaneesh_massey_1
- How to Identify Promotable EmployeesUploaded byAzwin
- 博客相关代码Uploaded byfashioninchina
- Db2 Training Class 001Uploaded byParesh Bhatia
- Boiler MaintenanceUploaded bySubbarayan Saravanakumar
- EPSTEIN & MARTIN - Qualitative Aproach to Empirical Legal Research.pdfUploaded byDouglas Zaidan
- genre analysisUploaded byapi-252218826
- Fert Lizer SpUploaded byJeevan Landge Patil
- IMPACT OF HR TECHNOLOGY SOLUTION IN RETAIL MANAGEMENTUploaded byJay Patel
- pipelined processor design.pptUploaded bySajjad Rizvi

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.