Professional Documents
Culture Documents
and Privacy
NARASIMHA
164G1A0559
What is Big Data
Four characteristics
◦ Volume – data sizes (From terabytes to zettabytes)
◦ Variety – data in different formats and structure
◦ Velocity(速度) – time required to act on the data is very small
◦ Huge number of data sources – integration and cross-correlation
among data sets from different source
What big data for
Technological advances and novel applications are making possible to
capture, process, and share huge amounts of data referred to as big
data
Managed well, the data can be used to unlock new sources of economic
value,
provide fresh insights into science and hold governments to accounts
(The Economist, 2010)
CIA Theory
Confidentiality: protection of data against unauthorized disclosure
Integrity: prevention of unauthorized and improper data modification
Availability: prevention and recovery
from hardware and software errors and
from malicious data access denials
making the database system unavailable.
Data Confidentiality
Several data confidentiality techniques exist – the most notable being
access control and encryption.
CHALLENGERS
1. Merging large numbers of access control policies
◦ Multi-system = Multi sets of data = Multi sets of access control policy
◦ Integration of system = Integration of database = Enforced access control
policy
◦ suppose that we observe that the same data value is provided by three
different sources
◦ In general this may lead one to conclude that the data value is trustworthy.
◦ However if these three sources have a very strong relationship
◦ it may not be realistic to assume that the data value is provided by three
independent sources
Wiki
Three independent
Kenneth Journal A source?
Journal C
Privacy Risks
• Exchange and integration of data across multiple sources
– Data becomes available to multiple parties
– Re-identification of anonymized data becomes easier
Open issues:
– Scalability
– Support for complex matching, such as semantic matching
– Definition of new security models
Privacy Enhancing Techniques(con’t)
• Privacy-preserving collaborative data mining (earlier work by C. Clifton, M.
Kantarcioglu et al.)
open issues:
– Scalability
– Data mining on the cloud
• Privacy-preserving biometric authentication (by E. Bertino et al.)
open issues:
– Reducing false rejection rates
– Using homomorphic techniques
Privacy Enhancing Techniques(con’t)
• Privacy-preserving data management on the cloud
– CryptDB (by CSAIL)
– DBMask (by E. Bertino et al.)
Issues:
– Weak security (CryptDB)
– Weak protection (or lack or protection) of access patterns
Privacy-enhancing challenging
1. Efficiency:
• Challenge
- Unable to scale to large data sets
• Solutions:
- Develop efficient cryptographic building blocks
• More work needs to be done on:
- engineering protocols and system
- parallel processing techniques for cryptographic protocols
- metrics to assess efficiency
- data privacy and utility in the use of the different building blocks
- support mission-oriented tradeoff among efficiency, data privacy, and data
utility
Privacy-enhancing challenging(con’t)
2. Security with privacy
• Challenge:
- Can security and privacy can be reconciled?
if we want to achieve security, we must give up privacy;
if we are keen on assuring privacy, we may undermine security.
• Solutions:
- Recent advances in applied cryptography are making possible to work on
encrypted data – for example for performing analytics on encrypted data.
• More work needs to be done:
- For that data privacy techniques heavily depend on the specific use of data
and the security tasks at hand.
Privacy-enhancing challenging(con’t)
3. Data ownership
• Challenge:
- Problem of who is the owner of a data item
The owner of a data item can be defined to be the user, whose information is
recorded in the data item,
The owner can be defined to the party that created the data item by collecting
information from the user.
• Solutions:
- Replace the concept of data owner with the concept of stakeholder.
• More work needs to be done:
- Technology, organizational, and legal solutions need to be investigated to
manage conflicts.
Privacy-enhancing challenging(con’t)
4. Privacy-aware Data lifecycle framework
• Data sharing
- Users need to be informed about data sharing/transferred to other parties.
- It is thus critical to devise legal guidelines on which technical mechanisms can be
based.
Conclusion
Focus on research challenges specific to confidentiality, trustworthiness, and
privacy in big data.
Suggest some possible solutions to the challenges.