You are on page 1of 15

…helping protect

sensitive data

 What is Data Masking
 Why Data Masking
 DM Transformation - Informatica
 Different Masking Rules
 Key Masking
 Random Masking
 Inbuilt Masking Rules
 Substitution Masking
 Procedure followed – DM requirement in DICoE
 Challenges faced in AASI Data Masking

Transformation of sensitive information into de-identified, realistic-looking

 Data remains relevant and meaningful

 Preserves the original characteristics of data
 Preserves referential integrity

 There are requirements in the Enterprise for production data in non-
production environments for needs like
 Development
 Test
 Data Analysis and training
 Organizations take immense measures to secure private data in
production environments. As a result the non-prod environments become
an attractive target to the malicious users.
 There rises need to use the prod data in testing environments in a way
such that the sensitive data is masked yet realistic.
 Informatica power center data masking option protects the sensitive
information by masking it while maintaining the original nature of data
and preserving the referential integrity.
Pre-requisite for Data Masking transformation is Infa 8.5.1. In AMP DM
server components are installed in Infa 8.6.1

 Data Masking feature can be utilized by just adding a new transformation
– Data Masking transformation in the mapping.

 The DM transformation masks the source data based on the masking

rules that we configure for each field which is identified to have sensitive

 Masking rules can be configured to provide

 Non-Deterministic Randomization
 Deterministic and Repeatable masking
 Blurring – adding variance value to the original data
 Substitute original data with false unrealistic data

Different Masking Rules

Key Masking :
 produces deterministic data.
 Maintain referential integrity by the use of seed value.
 DM transformation requires seed value to return deterministic data. DM
transformation creates default seed value and is also editable. Default seed
value is a random number between 1 and 1,000.

Key Masking types

String Masking – Key masking for strings can be configured to generate

repeatable outputs. We can specify the following for string key masking
 Mask Format – Different Mask formats are A,N,D,X,+,R
 Characters to be masked in the source string
 Replacement characters

Numeric Masking – Field in the source file or table can be configured for
numeric key masking to generate repeatable outputs.

Date Masking – This masking rule can be used if a particular date column
needs to be masked in such a way that it maintains referential integrity.
Random Masking :
 to generate non-deterministic data
 The Data Masking transformation returns different values when the
 same source value occurs in different rows.

Random Masking Types

Numeric Masking
Rules that can be applied for numeric random masking
Range – define range of the masked value
Blurring – generate masked values that are within the fixed or percentage variance of
source data.

String Masking
Similar rules as string key masking. In addition there will be option to specify the range
of string length.

Date Random Masking
Masking rules that can be applied
 Range - upper/lower bound of the masked date value
The default date time format is MM/DD/YYYY HH24:MI:SS.
 Blur – mask date based on the variance applied to the unit of date.
Blur unit can be year, month day or hour. Default is year.
DM applies variance to the selected blur unit and for other units random
numbers are substituted.

For example, to restrict the masked date to a date within two years of
the source date, select year as the unit. Enter two as the low and high

Inbuilt Masking Rules
 Inbuilt masking rules that can be applied
 Credit card
 URL / IP address
 Phone
 Email address

Masking Social Security Number

A list containing the valid SSN numbers will be stored in the infa server path
<Installation Directory>\infa_shared\SrcFiles\highgroup.txt. The DM transformation
access the highgroup.txt file and generates masked SSN that is not available in the list.

Substitution Masking:

Substituting data with lookup transformation

Apart from the different masking algorithms that are available, we can also substitute
original data with unreal information from dictionary files. The default dictionary files
will be available in the following path server\infa_shared\LkpFiles

Example: FirstNames.dic
This file will contain SNo column and FirstNames column. In the mapping we can
generate a random number using the DM, give random number as input to lookup
thereby lookup for the SNo and get the first name from the lookup file. Suppose the dic
file has 100 names in it, while generating random numbers range can be specified and 1
to 100.

 Identify sensitive fields
 Documenting DM requirement in proper format– ideally it should have
table/file name , attribute/field name , DM required (Y/N) , PK/FK relation ,
Rule Type and Description
 If requirement has common fields to be masked across files/tables ,
creating mapplet with the “to be masked” fields would be helpful.
 Coding/Testing

Few challenges faced in AASI Data Masking

 In the AASI DM requirement, the source and target were MF files. So to ensure
that our DM mappings makes no impact to the fields which does not require
masking , we had Only the “fields to be masked” in the Data Map and all others
were declared as Filler with binary data type.
 Masking Format defined for attributes should be in sync with the actual character
feed from source as in case of String Masking. For example, if the mask format is
defined to have “alphabets – A” but value from source is having special characters
(@,$ etc.), error will pop up - “Invalid input mask format”
 SSN Masking accepts only valid TAX_ID as input example: XXX-XX-XXXX.
So if we are planning to use inbuilt SSN masking, we need to take call on whether
to use SSN masking by transforming the input source value to proper format or
simply use numeric/string key masking.
 Maintaining Data Quality – DM transformation produces masked output based on
inbuilt algorithms, so even if null or 0 values are passed as input, DM generates a
masked output value. But it may be that the downstream teams using the masked
data may need to check for NULL or 0 values in the source. So we need to make
sure that the data quality is maintained. As in the above case, we may have to
apply a transformation to retain source value in case of 0 or NULL.