You are on page 1of 25

A Presentation On

Approach for big data Privacy Preservation and


Copyright Protection using HDFS.
Outline
 Introduction
 Motivation
 Literature Survey
 Existing System
 Problem Statement
 Proposed System
 Conclusion
Introduction
 Big data

 Privacy and security – big issues.


 Data anonymization and encryption-prevents from
privacy breach.
 Big data-Structured, semi-structured & unstructured
adds more challenges.
 Fingerprint techniques provides copyright protection
and ensure traitor identification(if any).
Introduction
Data Attribute Security And Privacy In Distributed Database
System:
Data Recipient
Data Collection Data Publishing

Data Publisher

Alice Bob Cathy Doug

Figure 1: Data collection and data publishing


Motivation of system
• With exponential increase in number of everyday internet
users it is needed to understand their internet usage activity
for further improvement in services provided to them.
• The data involved in these cases is a huge chunk of data
which needs to be studied thoroughly to understand user
needs.
• To provide great user experience to users in their day to
day activity this Big Data needs to be analysed.
• Cloud computing is efficient way to manage and review
this data.
• So there is need of system which provide user privacy as
well as data security.
Literature survey
Paper and Author name Observation Remarks

[1] “Protection of Big Data Applying the described -presented an existing privacy preserving
Privacy”. Mehmood. A., techniques, privacy is mechanisms in the various stages of life
Natgunanathan, I., Xiang, protected but the data may cycle of big data like data generation, data
Y., Hua, G., & Guo, S. lose the meaning in the real storage and data processing.
(2016). world and also the utility and
significance. Therefore the -presented various challenges of
techniques need to be preserving privacy in big data.
modified or extended to
handle the privacy and -Various risks involved in the
security of big data in an anonymization, encryption and storage of
efficient manner. data in the cloud are also investigated.
Continue…
Paper and Author name Observation Remarks

[2] “A Framework for More ways are to be explored -Rampart framework for privacy
Categorizing and Applying to protect privacy against preservation
Privacy-Preservation various threats. -Consists of seven procedures as
Techniques in Big Data anonymization, reconstruction,
Mining”. Xu, L., Jiang, C., modification, provenance, agreement,
Chen,Y., Wang, J., & Ren, trade and restriction to prevent outside
Y. (2016). intrusion.
- This framework tries to give high
priority to maintain the balance between
data utility and privacy.
Continue…
Paper and Author name Observation Remarks

[3] “Preserving Privacy in Limitations such as data -Privacy preserving problem of big data
MapReduce Based Clouds: distortion and no one of them in the context of hybrid cloud computing
Insight into Frameworks is fully fit for privacy is investigated.
and Approaches”. Al- preservation. -Presented frameworks such as Airavat,
Aqeeli, S., & Alnifie, G. Sedic, Sac-FRAPP and Hyper-1 based on
(2015). Mapreduce.
-Anonymization, encryption and
differential privacy are the efficient
methods for protecting privacy of data is
recorded here.
Continue…
Paper and Author name Observation Remarks

[4] “Overlapping Slicing This solution overcame with -A new privacy model with overlapping
with New Privacy Model”. the limitations of slicing and slicing which duplicates attribute in more
Giri S. Suman., & anonymization but while than one column is presented.
Mukhopdhay Milav. reducing attribute disclosure -This model increases privacy and utility
(2014). risk increase the chance for of data by achieving correlation among
identity disclosure risk. attributes.
- It can also handle high dimensional data.
Continue..
Paper and Author name Observation Remarks

[5] “Fingerprinting System is developed by -A technique for fingerprinting


Numeric Databases with avoiding collusion and it is considering knowledge preservation on
Information Preservation primary key independent. numeric relational data ensures that the
and Collusion Avoidance”. The fingerprinting technique usability constraints doesn’t violates.
Arti Mohanpurkar, provides security against -By optimizing the error which is
Madhuri Joshi. (2015) ownership theft and a helps in inserted with use of Particle Swarm
traitor tracing. Optimization, knowledge preservation is
achieved.
Continue..
Paper and Author name Observation Remarks

[6] “A Traitor Identification The system performs blind -Traitor identification system embeds
Technique for Numeric decoding and it is considered fingerprint securely for providing
Relational Databases with as robust against attacks such protection to numeric relational
Distortion Minimization as tuple insertion and databases.
and Collusion Avoidance”. deletion, attribute deletion -Insertion technique reduces time
Arti Mohanpurkar, etc. complexity as well as ensures that
Madhuri Joshi,.(2016) inserted fingerprint in the form of an
error bits leads to minimum distortion.
Existing System
 Privacy preserving with the help of Cryptography
algorithm.
 The system work with traditional approach for distributed
system.
 In Cryptography algorithm the security is provided with the
help of key.
 Data leakage issue when apply Multi Party Computation
(SMC) protocol.
 No provision for copyright protection.
 Key generation and key management is the time consuming
process.
 Data loss occurs in this system because of key
management.
Disadvantages of Existing System
 Data leakage as well as data lineage issue.
 System can work only on structured dataset.
 Time complexity is very high.
 No database security has defined.
 No provision defined for copyright protection and traitor
identification.
Problem Statement
To design and implement a system that can provide
privacy and security to overall system in HDFS
environment. With the data privacy system prevents
the data from various attacks for sensitive
information breaches and with the security system
provides copyright protection along with traitor
identification (if any).
Proposed System

 We present a novel technique called slicing, which


partitions the data both horizontally and vertically.
 We show that slicing preserves better data utility than
generalization and can be used for membership
disclosure protection.
 It can handle high-dimensional data.
 Provide the document level privacy as well as copyright
protection using multi party computation using
fingerprint technique.
 Execute the system with HDFS in heterogeneous
clusters .
System Architecture
Objectives
 Provide document level copyright protection.
 Implement a flexible privacy approach.
 Implement SQL injection and prevention algorithm that can
be provide a security to database from bruit force, injection
attack as well as malicious queries.
 System can support big data also.
 System can support structured and semi-structured data
Advantages of System

• The given method is more flexible based on time complexity


and execution time.
• Base algorithm also consider providers in anonymization
technique. If database has only one provider data algorithm
will not give any output. Data will loss or in go in waiting
condition. This will not happen with use of slicing.
• The system gives high security to data with the help of
copyright protection and traitor identification.
• One of the primary drawbacks that restricts some companies
from getting fully involved in database marketing is its high
costs. This system overcomes this drawback.
Mathematical Model
Here overall system set as s={s1,s2,s3….sn}
Here S1 provide Data provider module, S2 denote the load
balancing and fingerprint insertion. S3 execute Data
privacy mechanism. S4 access control and s5 denotes the
attack as well as analysis phase.
Now,
S1={D1,D2…..Dn} set of documents
Rec={AA1,AA2….AAn} select each attribute
Access={D1a, D2a….. Dna}
Ins={AA1(a*)…….Aan(a*)} // adding fingerprint to
Continue
P= {DBp1,DBp2…….DBpn}
// database i.e data provided by providers
// Apply F on it.
F = {Fingerprint adding(FA), slicing algorithm(SA), binary
algorithm(BA), privacy verification algorithm(PA)}
T* = {RuˆDBpn}
// collaborative data according to user request and database
which we have. F provides privacy and security to input data.
e = output in table format according to user authentication.
Success condition,
Ru[i] ≠ NULL, DBpn ≠ NULL
Failure condition,
Ru[i] = = NULL, DBpn = = NULL
Conclusion
 The proposed system help to improve the data privacy and
security when data is gathered from different sources. The
system also provide a drastic security in the form of
copyright protection.
Future Scope

System can be implemented on hadoop base system with

cube materialization and map reduce with load rebalancing


approach using heterogeneous cluster node.

In future this system can be considered for data which are

distributed in ad hoc grid computing. Also the system can be


considered for set valued data.
References
[1] Peng Xu, Tengfei Jiao, Qianhong Wu, WeiWang, and Hai Jin, ”Conditional
Identity-Based Broadcast Proxy Re-Encryption and Its Application to Cloud
Email”, IEEE Transactions on Computers,Vol. 65, No. 1, 2016.
[2] Yu, S., Wang, C., Ren, K., Lou, W. “Achieving secure, scalable and fine-
grained data access control in Cloud Computing”, Proceedings of IEEE
INFOCOM, pp. 15-19, 2010.
[3] Kamara, S., Lauter, K. Sion, R., Curtmola, R., Dietrich, S.,
”Cryptographic Cloud Storage”, 2010 Workshops of LNCS Springer,
Heidelberg, vol. 6054, pp. 136-149, 2010.
[4] Bethencourt, J., Sahai, A.,Waters, B., ”Ciphertext Policy Attribute-Based
Encryption”, 28th IEEE Symposium on Security and Privacy, pp. 321-334,
2007.
[5] Giuseppe Ateniese, Kevin Fu, Matthew Green, Susan Hohenberger
”Improved Proxy Re- Encryption Schemes with Applications to Secure
Distributed Storage”, Proceedings of the 12th Annual Network and
Distributed System Security Symposium (NDSS), 2005.
Continue…
[6] Goh, E.J., Shacham, H., Modadugu, N., Boneh, D.,”SiRiUS:
Securing Remote Untrusted Storage”, NDSS, 2003.
[7] Giuseppe Ateniese, Karyn Benson, Susan Hohenberger, ”Key-
Private Proxy Re-Encryption”, Proceedings of the The
Cryptographers Track at the RSA Conference 2009 on Topics
in Cryptology, pp. 279-294, 2003.
[8] Mihir Bellare, Alexandra Boldyreva, Anand Desai, and David
Pointcheval, ”Key-privacy in public key encryption”,
ASIACRYPT, pp. 566-582, 2001.
[9] D. Boneh and M. Franklin, ”Identity Based Encryption from
theWeil Pairing”, Advances in Cryptology - CRYPTO, vol.
2139 of LNCS, Springer, pp. 213-229, 2001.
thank you

You might also like