You are on page 1of 45

“Big Data- What’s the Big Deal?


DATA SECURITY & PRIVACY FOR BIG DATA

Cindy E. Compert CIPT/M


CTO Data Security & Privacy, IBM Security
@CCBigData

1/27/17
“A ship in port is safe; but that is not what ships are built for.
Sail out to sea and do new things” – Grace Hopper

2 IBM Security
Agenda

• Introduction
• Mega Trends
• Security & Privacy considerations
• Architecture, Technical controls, best practices
• Wrap up

3 IBM Security
Short History Lesson

“Big Data” Volume, Variety, MapReduce Hadoop, HDFS


Velocity

4 IBM Security
Big Data Grows Up

5 IBM Security
Client Challenges: Megatrends

 Evolving regulatory patchwork

 Breach threats/costs, reputational risk, sanctions

 The ‘Snowden’ effect

 Maintain privacy, encourage innovation

1 Forrester Research: “Understand The State Of Data Security And Privacy: 2014 To 2015

6 IBM Security
Digital Convergence: IoT, Analytics, Big
Data, Cognitive, Cloud
Analytics that Learn

HEALTH EDUCATION EDUCATION

Watson Oncology- Bringing personalized Cognitoys- Toy Dino


choose cancer learning to children uses cognitive-enabled
treatment therapies around the world learning for customized
based on a tumor's interaction
genetic fingerprints

Watson Personality Insights: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/personality-insights.html

8 IBM Security
A Data Lake is a Data Scientist’s Dream!

… but data without analytics is just a liability


9 IBM Security
Security and Privacy
Considerations
Security compared to Privacy in a nutshell
Confidentiality: Preventing access to non-public information that two parties agree to restrict. May relate
to personal or business information. May not be subject to Privacy laws.
Regulated Data:
Government and/or
industry regulation;
including PI (Personal
Information), healthcare
Data Privacy: Controls (e.g. HIPAA/HITECH),
Regulatory or and/or financial
how 3rd Party Requirements
personal or regulated information (e.g. FFEIC)
information is collected,
used, and shared in Privacy Confidentiality
accordance with policy
and/or external Security
laws/regulations. &
Security
Privacy

Data Security: The PI – Personal


technical safeguards used Information: Any
to ensure confidentiality, information that identifies
integrity, and availability of or can reasonably be used
data. to identify, contact, or
locate an individual.

11 IBM Security
Why is Big Data different?

 More data, exponentially more risk


 Immature- less security, governance and discipline,
rapidly evolving*
 New types of data, new privacy implications
 Smart meters, health monitors, connected home,
connected car
 Linked data– linking public and private data exposes
new risks
 New uses of data mapping to privacy policy

* http://www.techrepublic.com/article/cios-still-dont-care-about-hadoop-data-security/

12 IBM Security
Cool or Creepy?

http://www.zdnet.com/pictures/nine-warning-signs-that-your-technology-needs-an-upgrade/2/

13
EU GDPR will change the Analytics and Cognitive Landscape

• Definition of “Personal Data” now explicitly includes online


identifiers, location data and biometric/genetic data

• Higher standards for privacy notices and for obtaining


consent

• Easier access to personal data by a data subject

• Enhanced right to request the erasure of their personal data

• Right to transfer personal data to another organization


(portability)

• Right to object to processing now explicitly includes profiling.

14 IBM Security 1/30/201 14


Big Data life cycle – from raw to production

Business Users Data Scientist / Traditional


(With An Idea), Data Miner, IT / Application
Power Users, Advanced Business Developer
Data Analysts User,
Application Developer

search & exploratory operational


survey analysis

 text search  from mountain of  creating/standing up


 simple investigations data into a applications,
 peek / poke structured world with processes, systems
apps to provide with enterprise
business value characteristics
 iterative in nature,  more formal
many false starts environment, SLAs,
etc
15 IBM Security
Fit-for-purpose security and privacy

Initial / exploratory
. . . . . . . . 192 Used for business decisions
use cases
Few security or privacy concerns Protect, Secure, Encrypt

Audit trail tracking


Sporadic change management
access & changes
No data retention requirements Preserve data for N years

Little to no regulation Legislated requirements

No / isolated data quality concerns Data quality imperatives

Sources of information are “interesting” Sources must be trusted

No difference in data governance requirements


once the data is used for making operational business decisions

16 IBM Security
Privacy is the ‘Why’ and ‘What’…
Security is the ‘How’
PI, PII, PHI, NPI.. What is ‘Personal’? It Depends1

CAUTION: Your Legal, Compliance, and Privacy Organization makes a determination of


how to enforce privacy regulations, based on risk. IT and InfoSec should not be the
arbiters.

18 IBM Security
How unique are you?

• Dr. Latanya Sweeney (Harvard, FTC Chief Technologist)- 1997 study identified
uniqueness using US Census predicted 87 percent of U.S. population had unique
combinations- just using date of birth, gender, and zip code
• Try it yourself here: http://aboutmyinfo.org
• Additional study on personal genome project identified 84-97% of records, also
using demographics plus data mining
(http://dataprivacylab.org/projects/pgp/1021-1.pdf)

19 IBM Security
Location Location Location

20 IBM Security
Questions to ask

1. Where is the sensitive data?


2. Who owns it?
3. How is it classified and managed?
4. How do you know who is accessing it?
5. Where is it flowing?
6. How is it shared?
7. How is it used in test environments?
8. What about 3rd parties and vendor access?
9. What is the quantifiable risk?
10.How do you prioritize discovery and classification?

21 IBM Security
5 steps to a Critical Data Protection Program

The Approach: A comprehensive method for safeguarding your Crown Jewels


and protecting your brand

• Define Crown Jewels


• Determine Data Security Objectives

• Understand Client Data Security Environment and Infrastructure


• Define and Complete Data Discovery Process
• Perform Data Analysis and Classify

• Establish Crown Jewels Baselines


• Assess and Score Client Data Security Processes and/or Controls
• Perform Gap Analysis and Develop Hypotheses

• Determine Risk Remediation Plan


• Prioritize and Validate Risk Remediation Solutions
• Plan, Design, and Implement

• Determine Crown Jewels Governance Metrics and Process


• Enable Monitoring, Communications and Response
• Establish Revalidation Criteria and Process

22 IBM Security
Where Next? Data Classification

Non-flammable Spontaneously Flammable


Non-toxic When combined with water
Health Hazard

Explosive
Toxic

23 IBM Security
Architecture, Technical Controls, Best Practices
Security is Security.. Same Disciplines apply… BUT..
Global Threat Intelligence
Antivirus
Endpoint patching and management
Malware protection
Incident and threat management Transaction protection
Firewalls Device management
Sandboxing Content security
Virtual patching
Network visibility

Log, flow and Fraud protection


data analysis Criminal detection
Security
Application scanning Intelligence Anomaly detection
Application security Vulnerability
management assessment
Incident
response

Privileged identity management


Data monitoring Cloud
Data access control Entitlements and roles
Access management
Consulting Services | Managed Services Identity management

25 IBM Security
Big Data Technical Components

Understand and navigate


Federated Discovery and Navigation
federated big data sources

Manage & store huge Hadoop File System, Apache Spark


volume of any data MapReduce

Structure and control data Data Warehousing, In memory,


Cloud databases (Spark, Cloudant)

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Integrate and govern all Integration, Data Quality, Security,


data sources Lifecycle Management, MDM

26 IBM Security 26
A Hadoop Security Architecture

Dynamic Data Static Data


(in use) (at rest)
..and masking

http://www.hadoopsphere.com/2013/01/security-architecture-for-apache-hadoop.html

27 IBM Security
Monitoring and auditing challenges

•Many avenues to access

•Security and
authentication is evolving

•Complex software stack


with significant log data
from each component

•Security and audit


viewed in isolation from
rest of data architecture

28 IBM Security
Data Security and Privacy Core Disciplines

Security Controls Core Disciplines: The ‘How’

Understand & Secure & Monitor


Define Protect & Audit

Implement Identity & Access Define policies and


Discover sensitive Management , Activity
assets & who has access metrics
Monitoring

Redact/encrypt/mask
Classify Assets & Quantify Monitor and enforce;
sensitive data in all Review policy exceptions
risk.
environments

Harden environments to Audit and report


Assess Vulnerabilities for compliance
reduce risk

29 IBM Security
Security Controls for Privacy

On- Hybrid Cloud


Premise

Manage Access
Protect Data Gain Visibility
Enforce Separation of duties ,
Identify vulnerabilities Monitor data and applications:
Safeguard privileged user
Prevent attacks targeting Security breaches
access, ,Applications, and
sensitive data Compliance violations
devices
• Data Encryption, Masking, Redaction
• Identity Governance • Security Information and Event Monitoring
• Security Intelligence
• Privileged Identity Management • Real-time alerting and blocking
• Data and File Activity Monitoring
• Mobile Data Management • Cloud access and risk assessment
• Application and Mobile App Scanning

Optimize Your Privacy and Data Security Program


Deliver a consolidated view of your security operations

• Privacy Program Management • Security & Privacy Risk and Performance Metrics

30 IBM Security
Utilitize real-time data activity monitoring for privacy, security & compliance

Data Repositories
 Continuous, policy-based, real-time (databases, warehouses, file
shares, Big Data)
monitoring of all data traffic activities,
including actions by privileged users
Centralize compliance reporting
 Data protection compliance automation
Real-time alerting
Monitoring
Appliance

Key Requirements

 Implement on premise or cloud  100% visibility including local admin access


 Non-invasive/disruptive, cross-platform  Minimal performance impact
architecture  Should not rely on resident logs that can easily
 Separation of duties enforcement for Database be erased by attackers, rogue insiders
Administrator (DBA) access  No environment changes
 Detect or block unauthorized & suspicious activity  Integration with broader privacy, security and
 Granular, real-time policies compliance tools
 Who, what, when, how

31 IBM Security
Sample Activity
Privileged Monitoring
User Activity ReportReport

32 IBM Security
Data Obfuscation Controls Original Value
4536 6382 9896
5200
Masking Redaction
 The ability to desensitize sensitive  The process of obscuring part of a text for
information and make it unreadable from security purposes.
its original form while preserving its format
 The ability to replace real data with
and referential integrity
substitute characters like (*)
 it is a one way algorithm – ie. No unmasking data
 SDM – Static Data Masking
 DDM – Dynamic
Data Masking Masked Value Redacted Value
4212 5454 6565 7780 4536 6382 **** ****

Tokenization Encryption
 The process of substituting a “token” which  The process of encoding data in such a
can be mapped to the original value way that only authorized individuals can
 Token is a non-sensitive equivalent which has no read it by decrypting the encoded data
extrinsic value
with a key
 Must maintain a mapping between the tokens and
the original values  Format Preserving Encryption (FPE) is special
form of encryption

Token Value Encrypted Value


ABCD GDIC JIJG VXYZ 1@#43$%!xy1K2L4P
33 IBM Security
Encrypt Data at Rest

Encryption can provide Safe Harbor protection from breach disclosure in many states (consult your
compliance team for details)
Implement Data protection for your database, HADOOP, and file system environments
 Look for high performance encryption, access control and auditing
 Data privacy for both online and backup environments
 Unified policy and key management for centralized administration across multiple data servers
Look for transparency to users, databases, applications, storage
 No coding or changes to existing IT infrastructure
 Protect data in any storage environment
 User access to data same as before
Look for centralized administration and Separation of Duties
 Policy and Key management
 Audit logs
 High Availability

34 IBM Security
Identity and Access Management helps
secure the digital identities for an open
enterprise: Big and ‘Little’ Data
Datacenter Web Social Mobile Cloud

Threat-aware Identity and Access Management

Identity Management Access Management

• Identity Governance and Intelligence • Adaptive Access Control and Federation


• Identity Lifecycle Management • Application Content Protection
• Privileged Identity Control • Authentication and Single Sign On

Directory Services

On Premise Software- Cloud Managed /


Appliances as-a- Hosted Services
Service
35 IBM Security
Putting it all together: Sample Solution Architecture
Real-time alerting
and SIEM (Security
Information Catalog
Policies and value Information and Event
1 Monitoring) integration

Business policies
Big Data
4 Activity Monitoring
Sensitive data
2 discovery
Discovery Monitor & audit Big Data access
(HDFS, Hive, HBase, MapReduce, HUE, etc.)

3 Masked
MapReduce
Data-
Masked
Files
3 files
bases Masking Hadoop masking
files

Masked Big Data Masked


Masking (HDFS)
Files Loader
files files Output
Big Data files
Redacted Redacted Processing
Documents Redaction
documents documents (MapReduce)
Data sources Hadoop cluster

Components Capability
1 Information Catalogue Define privacy policies and share
2 Sensitive Data Discovery Discover and classify sensitive data
Data masking and document
3 Masking and Redaction
redaction
Monitor and audit Big Data (Hadoop)
4 Hadoop Activity Monitoring (HAM)
36 IBM Security activity
Best Practices: Build the foundation

First, know your data


1. Understand the data source, its “trust factor”, the data context and
meaning, and how it maps to other enterprise data sources.
2. Determine whether to operationalize (and retain) specific data sources,
and which zone to land the data, i.e. Hadoop, Data Warehouse, leave in
place, etc.
Steps to Assess and Protect:
1. Conduct a Privacy Impact Assessment and a Security Risk
Assessment.
2. Inventory and classify sensitive data.
3. Identify and match against legal, contractual, and organizational data
protection requirements with assistance from your security, privacy, and
compliance organization.
4. Identify protection standards for each classification. For example, all
credit card numbers must be encrypted in accordance with PCI DSS.
5. Identify the gaps and set up remediation plans.

37 IBM Security
Wrap Up
Key messages for sound public policy

- Enable data innovation

- Focus on risks to people

- Protect privacy through principles, not prescription

- Accommodate diversity

- Help organizations manage diverse legal systems

- Encourage organizations to demonstrate accountability

39 39
39 IBM Security
Summary: Keys to Success

1. Manage security and privacy at point of impact or as


far upstream as possible.
2. Use multiple complementary approaches to secure
critical data- different types of data have different
protection requirements
3. Use a holistic approach to safeguarding information
no matter where it is. Include the following items:
• Understand and document where the data exists
along with the exposure risk.
• Secure and continuously monitor access to data.
• Safeguard both structured and unstructured data
• Protect sandboxes and non-production
environments
• Demonstrate compliance to pass audits

40 IBM Security
THANK YOU
www.ibm.com/security
FOLLOW US ON:

ibm.com/security

securityintelligence.com
xforce.ibmcloud.com

@ibmsecurity

youtube/user/ibmsecuritysolutions

© Copyright IBM Corporation 2016. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. Any
statement of direction represents IBM's current intent, is subject to change or withdrawal, and represent only goals and objectives. IBM, the IBM logo, and other IBM products and services are trademarks of the International
Business Machines Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others.
Statement of Good Security Practices: IT system security involves protecting systems and information through prevention, detection and response to improper access from within and outside your enterprise. Improper
access can result in information being altered, destroyed, misappropriated or misused or can result in damage to or misuse of your systems, including for use in attacks on others. No IT system or product should be
considered completely secure and no single product, service or security measure can be completely effective in preventing improper use or access. IBM systems, products and services are designed to be part of a lawful,
comprehensive security approach, which will necessarily involve additional operational procedures, and may require other systems, products or services to be most effective. IBM does not warrant that any systems, products
or services are immune from, or will make your enterprise immune from, the malicious or illegal conduct of any party.
Resources

• Follow me on Twitter @CCBigData


• IBM Security: http://www-03.ibm.com/security/
• IBM Data Security & Protection: http://www-03.ibm.com/software/products/en/category/SWP23
• Data Security & Privacy Best Practices Blogs: https://securityintelligence.com/author/cindy-compert
• Guardium Actvity Monitoring for Hadoop info page: http://ibm.biz/BdsdhR
• IBM QRadar Security Intelligence: http://www-03.ibm.com/software/products/en/qradar-siem
• IBM Redbook: “Information Governance Principles and Practices for a Big Data Landscape:
https://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg248165.html
• Top Tips for Securing Big Data Environments: www.ibm.com/services/forms/signup.do?source=sw-
infomgt&S_PKG=500031830&S_CMP=Guardium_big_data_ebook

42 IBM Security
A recommended approach for Big Data:
Activity Monitoring

1. Identify users and classes of users – “privileged” users, data scientists…Who


is allowed to access sensitive data
 Validate with activity monitoring
2. Identify the applications, jobs, ad-hoc analysis
 Validate with activity monitoring
3. When possible identify, encrypt and mask sensitive data before it enters the
cluster and identify specific directory location in cluster for that data. Put tighter
monitoring controls around that data.
4. Look at exceptions – permission exceptions, other operational errors. Use
machine learning to identify patterns of suspicious activity.

43 IBM Security
Notices and • Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or
transmitted in any form without written permission from IBM.
disclaimers • U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM.

• Information in these presentations (including information relating to products that have not yet been announced by IBM) has been
reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM
shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY,
EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF
THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT
OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the
agreements under which they are provided.

• IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have
been previously installed. Regardless, our warranty terms apply.”

• Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without
notice.

• Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are
presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual
performance, cost, savings or other results in other operating environments may vary.

• References in this document to IBM products, programs, or services does not imply that IBM intends to make such products,
programs or services available in all countries in which IBM operates or does business.

• Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not
necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither
intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

• It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal
counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s
business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or
represent or warrant that its services or products will ensure that the customer is in compliance with any law.

WORLD OF WATSON 2016


44 IBM Security
Notices and • Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of
disclaimers performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be
addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-
party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED,
continued INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.

• The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents,
copyrights, trademarks or other intellectual property right.

• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise
Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM
ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®,
Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®,
SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®,
X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions
worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available
on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

• Notice: Clients are responsible for ensuring their own compliance with various laws and regulations, including the
European Union General Data Protection Regulation. Clients are solely responsibility for obtaining advice of competent
legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients’
business and any actions the clients may need to take to comply with such laws and regulations. The products, services,
and other capabilities described herein are not suitable for all client situations and may have restricted availability. IBM
does not provide legal, accounting or auditing advice or represent or warrant that its services or products will ensure that
clients are in compliance with any law or regulation.

WORLD OF WATSON 2016


45 IBM Security

You might also like