You are on page 1of 21

Big Data Security

and Privacy
NARASIMHA
164G1A0559
What is Big Data
Four characteristics
◦ Volume – data sizes (From terabytes to zettabytes)
◦ Variety – data in different formats and structure
◦ Velocity(速度) – time required to act on the data is very small
◦ Huge number of data sources – integration and cross-correlation
among data sets from different source
What big data for
Technological advances and novel applications are making possible to
capture, process, and share huge amounts of data  referred to as big
data

To extract useful knowledge such as patterns, from these data and


predict trends and events

Managed well, the data can be used to unlock new sources of economic
value,
provide fresh insights into science and hold governments to accounts
(The Economist, 2010)
CIA Theory
Confidentiality: protection of data against unauthorized disclosure
Integrity: prevention of unauthorized and improper data modification
Availability: prevention and recovery
from hardware and software errors and
from malicious data access denials
making the database system unavailable.
Data Confidentiality
Several data confidentiality techniques exist – the most notable being
access control and encryption.
CHALLENGERS
1. Merging large numbers of access control policies
◦ Multi-system = Multi sets of data = Multi sets of access control policy
◦ Integration of system = Integration of database = Enforced access control
policy

2. Automatically administration authorizations for big data and in


particular for granting permissions
◦ Authorizations can be automatically granted, possibly based on the user
profile  machine learning
Data Confidentiality (con’t)
3. Enforcing access control policies in big data stores
◦ Current solutions for querying big data sets rely on the use of scripts and
jobs written in programming languages
◦ The challenge is how to embed fine-grained access control policies into jobs
and scripts
◦ extend such approaches to support more complex access control policies
◦ To investigate encryption-based approaches for enforcing access control
policies in stores
Data Trustworthiness
One major application for big data  DECISION MAKING
◦ Therefore data must be TRUSTWORTHY
◦ Not only free of errors, also protected the data from malicious parties
aiming at deceiving the data users

Currently there is no comprehensive solution to the problem of high


assurance data trustworthiness
several relevant techniques have been proposed in different areas
Data Trustworthiness (con’t)
1. User support for data use based on trustworthy assessment
◦ As data have ultimately to be used by some human users, it is critical
that users be provided with some indicators about the
trustworthiness level of the data they receive
◦ For example, the trust score a data item value provided by a given
data source is a function of two factors:
◦ trustworthiness level = the reputation of the source + the difference
of the value with respect to values reported by other sources
Data Trustworthiness (con’t)
2. Data Correlation techniques
◦ interconnected big data often form large heterogeneous information
networks with information
redundancy
◦ Such redundancy represents an important opportunity to crosscheck
conflicting data values and to correlate data

3. High assurance and efficient provenance


◦ Data provenance is often a critical factor for assessing data trustworthiness
◦ Provenance information be protected from tampering when flowing across
various parties in a system
◦ extended for use in dynamic mobile environments
Data Trustworthiness (con’t)
4. Source correlation techniques
◦ relationships among data sources be also taken into account

◦ suppose that we observe that the same data value is provided by three
different sources
◦ In general this may lead one to conclude that the data value is trustworthy.
◦ However if these three sources have a very strong relationship
◦ it may not be realistic to assume that the data value is provided by three
independent sources

Wiki

Three independent
Kenneth Journal A source?

Journal C
Privacy Risks
• Exchange and integration of data across multiple sources
– Data becomes available to multiple parties
– Re-identification of anonymized data becomes easier

• Security tasks such as authentication and access control may require


detailed information about users
– For example, location-based access control requires information about
user location and may lead to collecting data about user mobility
– Continuous authentication requires collecting information
such as typing speed, browsing habits, mouse movements
Privacy Risks(con’t)
• The various social networking sites varying degrees to open their
users’ real-time data , which was collected not only by a number of
data providers, but also a number of monitoring data analysis
agencies.
Privacy Enhancing Techniques
• Privacy-preserving data matching protocols based on hybrid
strategies. (by M. Kantarcioglu et al.)

Open issues:
– Scalability
– Support for complex matching, such as semantic matching
– Definition of new security models
Privacy Enhancing Techniques(con’t)
• Privacy-preserving collaborative data mining (earlier work by C. Clifton, M.
Kantarcioglu et al.)
open issues:
– Scalability
– Data mining on the cloud
• Privacy-preserving biometric authentication (by E. Bertino et al.)
open issues:
– Reducing false rejection rates
– Using homomorphic techniques
Privacy Enhancing Techniques(con’t)
• Privacy-preserving data management on the cloud
– CryptDB (by CSAIL)
– DBMask (by E. Bertino et al.)

Issues:
– Weak security (CryptDB)
– Weak protection (or lack or protection) of access patterns
Privacy-enhancing challenging
1. Efficiency:
• Challenge
- Unable to scale to large data sets
• Solutions:
- Develop efficient cryptographic building blocks
• More work needs to be done on:
- engineering protocols and system
- parallel processing techniques for cryptographic protocols
- metrics to assess efficiency
- data privacy and utility in the use of the different building blocks
- support mission-oriented tradeoff among efficiency, data privacy, and data
utility
Privacy-enhancing challenging(con’t)
2. Security with privacy
• Challenge:
- Can security and privacy can be reconciled?
if we want to achieve security, we must give up privacy;
if we are keen on assuring privacy, we may undermine security.
• Solutions:
- Recent advances in applied cryptography are making possible to work on
encrypted data – for example for performing analytics on encrypted data.
• More work needs to be done:
- For that data privacy techniques heavily depend on the specific use of data
and the security tasks at hand.
Privacy-enhancing challenging(con’t)
3. Data ownership
• Challenge:
- Problem of who is the owner of a data item
The owner of a data item can be defined to be the user, whose information is
recorded in the data item,
The owner can be defined to the party that created the data item by collecting
information from the user.
• Solutions:
- Replace the concept of data owner with the concept of stakeholder.
• More work needs to be done:
- Technology, organizational, and legal solutions need to be investigated to
manage conflicts.
Privacy-enhancing challenging(con’t)
4. Privacy-aware Data lifecycle framework

A comprehensive approach to privacy for big data needs to be based on a systematic


data lifecycle approach.
• Data acquisition
- Need mechanisms and tools
◦ To prevent devices from acquiring data about other individuals which is relevant when
devices, like mobile phones, are used.

• Data sharing
- Users need to be informed about data sharing/transferred to other parties.
- It is thus critical to devise legal guidelines on which technical mechanisms can be
based.
Conclusion
Focus on research challenges specific to confidentiality, trustworthiness, and
privacy in big data.
Suggest some possible solutions to the challenges.

Still require multidisciplinary research drawing from many different areas,


including computer science and engineering, information systems, statistics, risk
models, economics, social sciences, political sciences, human factors,
psychology.
We believe that all these perspectives are needed to achieve effective solutions
to the problem of privacy and security in the era of big data and especially to
the problem of reconciling security and privacy.
Thank you!

You might also like