An Oracle White Paper July 2010

Data Masking Best Practices

................. 6 Sophisticated Masking Techniques ....... 12 Customer Case Studies ........................................................................Oracle White Paper—Data Masking Best Practices Executive Overview ........................................................... 1 Introduction ............... 3 Enforcing Referential Relationships during Data Masking ............................................................................................... 13 ............................................................................................... 2 Implementing Data Masking ........... 1 The Challenges of Masking Data ............................................................................................................... 12 Conclusion ..................................... 9 Integrated Testing with Application Quality Management solutions11 Oracle’s Comprehensive Solutions for Database Security ............. 2 Comprehensive Enterprise-wide Discovery of Sensitive Data ........................................... 7 High Performance Mask Execution ...... 4 Rich and Extensible Mask Library..............................

and other uses. As the number of applications increases. Retail companies share customer point-of-sale data with market researchers to analyze customer buying patterns. Very few companies do anything to protect this data even when sharing with outsourcers and third parties. Pharmaceutical or healthcare organizations share patient data with medical researchers to assess the efficacy of clinical trials or medical treatments. development. Numerous industry studies on data privacy have concluded that almost all companies copy tens of millions of sensitive customer and consumer records to non-production environments for testing. more and more data gets shared. Almost 1 out of 4 companies responded that live data used for development or testing had been lost or stolen and 50% said they had no way of knowing if data in non-production environments had been compromised.Oracle White Paper—Data Masking Best Practices Executive Overview Enterprises need to share production data with various constituents while also protecting sensitive or personally identifiable aspects of the information. 1 . Oracle Data Masking addresses this problem by irreversibly replacing the original sensitive data with realistic-looking scrubbed data that has same type and characteristics as the original sensitive data thus enabling organizations to share this information in compliance with information security policies and government regulations. Introduction Enterprises share data from their production applications with other users for a variety of business purposes. thus further increasing the risk of a data breach. Microsoft SQLServer. Most organizations copy production data into test and development environments to allow application developers to test application upgrades. This paper describes the best practices for deploying Oracle Data Masking to protect sensitive information in Oracle and other heterogeneous databases such as IBM DB2. where sensitive data gets exposed to unauthorized parties.

an advantage of the database scripts approach would appear that they specifically address the unique privacy needs of a particular database that they were designed for. the goal of this exercise is to come up with the comprehensive list of sensitive data elements specific to the organization and discover the associated tables and columns across enterprise databases that contain the sensitive data. With a script-based approach. Assess: In this phase. They may have even been tuned by the DBA to run at their fastest Let’s look at the issues with this approach. These steps are:  Find: This phase involves identifying and cataloging sensitive or regulated data across the entire enterprise. the DBA then hands over the environment to the application testers. these scripts would have to re-written from scratch if applied to another database. the most common solution: database scripts. Assess. The auditors would find it extremely difficult to offer any recommendation on whether the masking process built into a script is secure and offers the enterprise the appropriate degree of protection. 1. Take for example. Developers can leverage the existing masking library or extend it with their own masking routines. Secure. Reusability: Because of the tight association between a script and the associated database. auditors have no transparency into the masking procedures used in the scripts. and Test (FAST). The security administrator executes the masking process to secure the sensitive data during masking trials. Oracle has developed a comprehensive 4-step approach to implementing data masking called Find. developers or DBAs in conjunction with business or security analysts identify the masking algorithms that represent the optimal techniques to replace the original sensitive data. Once the masking process has completed and has been verified. new tables and columns containing sensitive data may be added as a part of the upgrade process. Transparency: Since scripts tend to be monolithic programs.Oracle White Paper—Data Masking Best Practices The Challenges of Masking Data Organizations have tried to address these issues with custom hand-crafted solutions or repurposed existing data manipulation tools within the enterprise to solve this problem of sharing sensitive information with non-production users. Secure: This and the next steps may be iterative.   2 . Maintainability: When these enterprise applications are upgraded. Implementing Data Masking Based on Oracle Data Masking . the entire script has to be revisited and updated to accommodate new tables and columns added as a part of an application patch or an upgrade. 3. 2. There are no common capabilities in a script that can be easily leveraged across other databases. At first glance. Typically carried out by business or security analysts.

fixes the masking algorithms and re-executes the masking process. This is because sensitive data is related to specific to the government regulations and industry standards that govern how the data can used or shared. the DBA restores the database to the pre-masked state. the data elements that need to be masked in the application must be identified.Oracle White Paper—Data Masking Best Practices  Test: In the final step. A typical list of sensitive data elements may include: Person Name Maiden Name Bank Account Number Card Number (Credit or Debit Card Number) Business Address Tax Registration Number or National Tax ID Business Telephone Number Person Identification Number Business Email Address Welfare Pension Insurance Number Custom Name Unemployment Insurance Number Employee Number Government Affiliation ID User Global Identifier Party Number or Customer Number Account Name Mail Stop GPS Location Student Exam Hall Ticket Number Club Membership ID Library Card Number Military Service ID Social Insurance Number Pension ID Number Article Number Civil Identifier Number Credit Card Number Social Security Number Trade Union Membership Number Oracle Data Masking provides several easy-to-use mechanisms for isolating the sensitive data elements. 3 . Comprehensive Enterprise-wide Discovery of Sensitive Data To begin the process of masking data. The first step that any organization must take is to determine what is sensitive. the first step is for the security administrator to publish what constitutes sensitive data and get agreement from the company’s compliance or risk officers. If the masking routines need to be tweaked further. the production users execute application processes to test whether the resulting masked data can be turned over to the other non-production users. Thus.

the relationship between the primary key and the foreign key columns. which allows efficient storage of application data without have to duplicate data. have published their application data model as a part of their product documentation or the support knowledge base. built into Enterprise Manager. including the ability to query sample rows from the tables. data is stored in tables related by certain key columns. called primary key columns. Peoplesoft and Siebel. For example. data masking users can easily associate the relevant tables and columns to the mask formats to create the mask definition. in a database or across databases. such as NNN-NN-NNNN for social security numbers or 16 or 15 digit sequences beginning with 3. Software vendors or service providers can generate these pre-defined templates and make them available to enterprises to enable them to import these templates into the Data Masking rapidly and thus. This is a dynamic living catalog managed by security administrators that needs to be refreshed as business rules and government regulations change as well as when applications are upgraded and patched and new data elements containing sensitive data are now discovered. accelerate the data masking implementation process. an EMPLOYEE_ID generated from a human capital management (HCM) application may be used in sales force automation (SFA) application tables using foreign key columns to keep track of sales reps and their accounts. the Data Masking a can assist enterprise users rapidly construct the mask definition – the pre-requisite to mask the sensitive data. Application Masking Templates: Oracle Data Masking supports the concept of application masking templates. this is not a static list. Enforcing Referential Relationships during Data Masking In today’s relational databases (RDBMS). which are XML representations of the mask definition. enterprises can now develop a comprehensive data privacy catalog that captures the sensitive data elements that exist across enterprise databases. Ad-hoc search: Oracle Data Masking has a robust search mechanism that allows users to search the database quickly based on ad hoc search patterns to identify tables and columns that represent sources of sensitive data.numbers. Oracle provides the Oracle Data Finder tool during data masking implementation to search across enterprises based on data patterns.Oracle White Paper—Data Masking Best Practices  Data Model driven: Typical enterprise applications. 4 . To be clear. By leveraging the published data models. 4 or 5 for credit card . such as E-Business Suite. When deploying a masking solution. With all the database management capabilities.   For deeper searches. business users are often concerned with referential integrity. Using the combination of schema and data patterns and augmenting them with published application meta data models.

thereby applying the same masking rules as applied to the database-enforced foreign key columns. the Oracle Data Masking discovers all the related foreign key relationships in the database and enforces the same mask format to the related foreign key columns. This guarantees that the relationships between the various applications tables are preserved while ensuring that privacyrelated elements are masked. In applications where referential integrity is enforced in the database. Oracle Data Masking allows these relationships to be registered as related columns in the mask definition. 5 . This means that when a business user chooses to mask a key column such as EMPLOYEE_ID.Oracle White Paper—Data Masking Best Practices CUSTOMERS EMPLOYEES    EMPLOYEE_ID FIRST_NAME LAST_NAME    CUSTOMER_ID SALES_REP_ID COMPANY_NAME Database enforced Application enforced SHIPMENTS    SHIPMENT_ID SHIPPING_CLERK_ID CARRIER Figure 1:The Importance of Referential Integrity Oracle Data Masking automatically identifies referential integrity as a part of the mask definition creation.

Oracle White Paper—Data Masking Best Practices Figure 2: Automatic enforcement of referential Integrity Rich and Extensible Mask Library Oracle Data Masking provides a centralized library of out-of-the-box mask formats for common types of sensitive data. By leveraging the Format Library in Oracle Data Masking. enterprises can apply data privacy rules to sensitive data across enterprise-wide databases from a single source and thus. ensure consistent compliance with regulations. Enterprises can also extend this library with their own mask formats to meet their specific data privacy and application requirements. national insurance number for UK). phone numbers. national identifiers (social security number for US. such as credit card numbers. 6 .

Sophisticated Masking Techniques Data masking is in general a trade-off between security and reproducibility. zip values need to be consistent after masking.g. written in PL/SQL. For example. These techniques ensure that applications continue to operate without errors after masking. Masking technique where data in sensitive columns is replaced with a single fixed value is 100% in terms of security and 0% in terms of reproducibility.g.Oracle White Paper—Data Masking Best Practices Figure 3: Rich and extensible Mask Format Library Oracle Data Masking also provides mask primitives. Recognizing that the real-world masking needs require a high degree of flexibility. e. generating a unique email address from fictitious first and last names to allow business applications to send test notifications to fictitious email addresses. which serve as building blocks to allow the creation of nearly unlimited custom mask formats ranging from numeric. Oracle Data Masking allows security administrators to create user-defined-masks. When considering various masking techniques. Compound masking: this technique ensures that a set of related columns is masked as a group to ensure that the masked data across the related columns retain the same relationship. let administrators create unique mask formats for sensitive data. Oracle Data Masking provides a variety of sophisticated masking techniques to meet application requirements while ensuring data privacy. For example. city. applying different national identifier masks based on country of origin. e. it is important to consider this trade-off in mind when selecting the masking algorithms. alphabetic or date/time based. A test database that is identical to the production database is 100% in terms of reproducibility and 0% in terms of security because of the fact that it exposes the original data.  Condition-based masking: this technique makes it possible to apply different mask formats to the same data set depending on the rows that match the conditions. state.  7 . These user-defined masks.

for example. the inverse function f-1(y1) should not produce x1 to ensure the security of the original sensitive data. enterprises can use the deterministic masking functions provided by Oracle Data Masking to consistently generate the same replacement mask value for any type of sensitive data element.Oracle White Paper—Data Masking Best Practices Deterministic Masking Deterministic masking is an important masking technique that enterprises must consider when masking key data that is referenced across multiple applications.. three applications: a human capital management application. In test systems.e. e. a deterministic function f(x) where f(x1) will always produce y1 for a given value x1. such as employee expense data provided by credit card companies. which may be EMPLOYEE IDs in the sales data warehouse. Now. it is important that the function f(x) not be reversible. To address this requirement. For example. Take. Deterministic masking techniques can be used with mathematical entries. Testers may find it disruptive if the underlying data used for testing is changed by production refreshes and they could no longer locate certain types of employees or customer records that were examples for specific test cases. In order for the deterministic masking to be applied successfully. which may also be EMPLOYEE IDs. i. type and characteristics as the original value. It is vital that deterministic masking techniques used produce the replacement masked value consistently and yet in a manner that the original data cannot be derived from the masked value.g. e. Thus. enterprises pre-load the flat file containing data using tools such as SQL*Loader. then mask the sensitive columns using deterministic masking provided by Oracle Data Masking and then extract the masked data back into flat file. as well as with text entries. 8 . to generate names. the employee credit card numbers have been obfuscated and can no longer be matched against the data in the flat files containing the employee’s real credit card number. social security numbers or credit card numbers. the application will be able to process the flat files correctly just as they would have been in Production systems. a customer relationship management application and a sales data warehouse.g. organizations may require that names always get masked to the same set of masked names to ensure consistency of data across runs. To ensure that data relationships are preserved across systems even as privacy-related elements are removed. in the customer relationship management application and sales representative IDs. deterministic masking techniques ensure that data gets masked consistently across the various systems.g. In production environments. One way to think of these deterministic masking techniques is as a function that is applied on the original value to generate a unique value consistently that has the same format. e. There are some key fields such as EMPLOYEE ID referenced in all three applications and needs to be masked in the corresponding test systems: a employee identifier for each employee in the human resources management application. Deterministic masking becomes extremely critical when testing data feeds coming from external systems. the feed containing real credit card numbers are processed by the accounts payable application containing employee’s matching credit card information and are used to reconcile employee expenses. customer service representative identifiers. into standard tables.

These are temporary tables that are created as a part of the masking process. the Oracle Data Masking builds a work list of the tables and columns chosen for masking. which are used by the database to capture changes made to files. For large tables. Because rows in a table are usually scattered all over the disk.Oracle White Paper—Data Masking Best Practices High Performance Mask Execution Now that the mask definition is complete. This enhanced efficiency makes the masked table available for users in a fraction of the time spent by an update-driven masking process. followed by the dependent tables containing foreign keys. the Oracle Data Masking can now execute the masking process to replace all the sensitive data. Once the mask work list is ready. The bulk mechanism used by Oracle Data Masking lays down the new rows for the masked table in rapid succession on the disk. With the cloned (non-production) database now ready for masking. Oracle Data Masking automatically invokes SQL parallelism to further speed up the masking process. Using the NOLOGGING option. Further. constraints. Other performance enhancements include using the NOLOGGING option when recreating the table with the masked data. the Oracle Data Masking generates mapping tables for all the sensitive fields and their corresponding masked values. Oracle Enterprise Manager can create a test database from an existing backup. The clone database capability also provides the option to create a clone image. Typical database operations such as row inserts or updates generate redo logs. Other tables that are not required to be masked are not touched. These redo logs are completely unnecessary in a data masking operation since the non-production database is not running in a production environment. Compare this with the typical data masking process. the tables with the primary keys get masked first. Oracle Data Masking rapidly recreates the masked replacement table based on original tables and the mapping tables and restores all the related database elements. which usually involves performing table row updates. Typically. Clone Live Database: Oracle Enterprise Manager can clone a live production data into any nonproduction environment within a few clicks. Using a highly efficient data bulk mechanism. requiring continuous availability and recoverability. Oracle Enterprise Manager offers several options to clone the production database:   Recover from backup: Using the Oracle Managed Backups functionality. the update process is extremely inefficient because the storage systems attempts to locate rows on data file stored on extremely large disks. which will be dropped once all data has been masked successfully. such as indexes. the tables selected for masking are processed in the optimal order to ensure that only one pass is made at any time even if there are multiple columns from that table selected for masking. grants and triggers identical to the original table. which can then be used for other cloning operations. the Oracle Data Masking bypasses the logging mechanism to further accelerate the masking process efficiently and rapidly. 9 .

7G of memory. Some of these include:   Flashback: Administrators can optionally configure Oracle databases to enable flashback to a premasked state if they encounter problems with the masked data. Oracle Data Masking is also integrated with Oracle Provisioning and Patch Automation in Oracle Enterprise Manager to clone-and-mask via a single workflow.Oracle White Paper—Data Masking Best Practices In internal tests run on a single-core Pentium 4 (Northwood) [D1] system with 5. Oracle Data Masking generates DBA-friendly PL/SQL that allows DBAs to tailor the masking process to their needs. Oracle Data Masking can handle significant volumes of sensitive data effortlessly both in terms of the number of sensitive columns as well as tables with large numbers of rows. PL/SQL: Unlike other solutions. Optimized for Oracle databases Oracle Data Masking leverages key capabilities in Oracle databases to enhance the overall manageability of the masking solution. The secure high performance nature of Oracle Data Masking combined with the end-to-end workflow ensures that enterprise can provision test systems from production rapidly instead of days or weeks that it would with separate manual processes. This PL/SQL script can also be easily integrated into any cloning process. the following performance results with reported. 10 . Criteria Column scalability Row scalability Baseline 215 columns 100 million rows 100 tables of 60G 6 columns Metric 20 minutes 1 hour 20 minutes Figure 4: Oracle Data Masking Performance scalability tests As these results clearly indicate.

Oracle White Paper—Data Masking Best Practices Support for heterogeneous databases Oracle Data Masking supports masking of sensitive data in heterogeneous databases such as IBM DB2 and Microsoft SQLServer through the use of Oracle Database Gateways. Thorough testing can help you identify application quality and performance issues prior to deployment. Figure 5: Data masking support for heterogeneous databases Integrated Testing with Application Quality Management solutions The final step of the masking process is to test that the application is performing successfully after the masking process has completed. but it is also one of the most critical to the project’s success. Oracle Enterprise Manager’s Application Quality Management (AQM) solutions provide high quality testing for all tiers of the application stack. 11 . Testing is one of the most challenging and time consuming parts of successfully deploying an application. Oracle Enterprise Manager’s AQM solutions provide a unique combination of test capabilities which enable you to:  Test infrastructure changes: Real Application Testing is designed and optimized for testing database tier infrastructure changes using real application workloads captured in production to validate database performance in your test environment.

the organization was able to use the role-based separation of duties to allow the HR analysts to define the security policies for masking sensitive data.Oracle White Paper—Data Masking Best Practices  Test application changes: Application Testing Suite helps you ensure application quality and performance with complete end-to-end application testing solutions that allow you to automate functional & regression testing. By implementing Oracle Data Masking. data classification. saving time and money. Oracle’s Comprehensive Solutions for Database Security Oracle provides a comprehensive portfolio of security solutions to ensure data privacy. Upon deploying the Oracle 12 . Customer Case Studies Customers have had a variety of business needs which drove their decision to adopt the Oracle Data Masking for their sensitive enterprise data. transparent data encryption. and enable regulatory compliance. protect against insider threats. development and reporting did not meet the established standards for privacy and confidentiality. the service provider deployed the mask definitions for their Oracle eBusiness Suite HR application and thereby rapidly brought the internal non-productions systems into compliance. their IT infrastructure was also growing thus placing an increased burden on their DBAs. The need for data masking can come from internal compliance requirements. and data masking. The DBAs then automated the implementation of these masking policies when provisioning new test or development environments. the organization quickly identified the Oracle Data Masking as ideally suited to their business needs based on the fact that it was integrated with their day-to-day systems management operations provided by Oracle Enterprise Manager. Due to the stringent requirement to create production copies available for testing within rapid time-frames. execute load tests and manage the test process. As the company was growing and offering new services. the telecommunications company was able to allow business users to ensure compliance of their non-production environments while eliminating another manual task for the DBAs through automation. Their database administrators (DBAs) had developed custom scripts to mask sensitive data in the test and development environments of their human resources (HR) application. the company evaluated the Oracle Data Masking among other commercial solutions. In the case of this UKbased government organization. In joint consultations with their IT service provider. Within a few weeks. auditing. Thus. This Middle East-based real estate company found that their data masking scripts were running for several hours and were slowing down as data volumes increased. With Oracle's powerful privileged user and multifactor access control. There are organizations that have internally developed data masking solutions that have discovered that custom scripts ultimately have their limits and are not able to scale up as enterprise data sets increase in volume. These benefits of using Oracle Data Masking were realized by a major global telecommunications products company that implemented the above methdology. the internal audit and compliance team had identified that the nonproduction copies of human resource management systems used for testing. monitoring. customers can deploy reliable data security solutions that do not require any changes to existing applications.

Oracle Data Masking is designed and optimized for today’s high volume enterprise applications running on Oracle databases.  13 .Oracle White Paper—Data Masking Best Practices Data Masking. Oracle Data Masking accelerates sensitive data identification and executes the masking process with a simple easy-to-use web interface that puts the power of masking in the hands of business users and administrators. organizations have able to ensure that non-production databases have remained compliant with IT security policies while enabling developers to conduct production-class testing. organizations have been able to reduce the burden on DBAs who previously had to maintain manuallydeveloped masking scripts. Conclusion Staying compliant with policy and government regulations while sharing production data with nonproduction users has become a critical business imperative for all enterprises. Organizations that have implemented Oracle Data Masking to protect sensitive data in test and development environment have realized significant benefits in the following areas:  Reducing Risk through Compliance: By protecting sensitive information when sharing production data with developers and testers. they discovered that they were able to accelerate the masking time from 6 hours using their old scripts to 6 minutes using the Oracle Data Masking. an improvement of 60x in performance. Increasing Productivity through Automation: By automating the masking process. Leveraging the power of Oracle Enterprise Manger to manage all enterprise databases and systems.

Other names may be trademarks of their respective owners. .com Oracle and Java are registered trademarks of Oracle and/or its affiliates. CA 94065 U. This document is provided for information purposes only and the contents hereof are subject to change without notice. without our prior written permission.650. This document is not warranted to be error-free.506. Ltd. Opteron. electronic or mechanical. Inc. 0110 Copyright © 2010. UNIX is a registered trademark licensed through X/Open Company. All rights reserved. including implied warranties and conditions of merchantability or fitness for a particular purpose. for any purpose. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. AMD. nor subject to any other warranties or conditions. and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices.7200 oracle. This document may not be reproduced or transmitted in any form or by any means.S. Athreya Contributing Authors: Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores.Data Masking Best Practices July 2010 Author: Jagan R.506. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. Worldwide Inquiries: Phone: +1.A. the AMD logo. whether expressed orally or implied in law.650. Oracle and/or its affiliates. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International.7000 Fax: +1.

Sign up to vote on this title
UsefulNot useful