In this new paradigm
do more with lesser or flat IT budgets, outsourcing has emerged as the key strategy for IT
application development and testing. However, increasing incidences of data theft and the need to comply with various
data privacy regulations related to movement of confidential data outside premises are forcing enterprises to desist from
outsourcing applications, which deal with Personally Identifiable Information (PII) and other sensitive data. The terms: data
obfuscation, data privacy, data masking are used alternatively to describe an ability to de-identify data for use in external
Implementing a Comprehensive Data Obfuscation solution can mitigate risks associated with outsourcing applications which
handle sensitive data.
Existing approach to address data privacy concerns
Enterprises tend to protect sensitive data by means of physical separation coupled with network isolation. However, with
most outsourcing vendors using cost-effective offshore locations for IT application development, testing and support, this
approach is not appropriate. To overcome this issue, enterprises have attempted to use custom SQL scripts to mask sensitive
Custom SQL scripts have their own limitations when used in an integrated environment. They are database specific and
require a lot of maintenance effort. Preservation of referential integrity across data sources is a night mare. The diversity of
data sources, database technologies and platforms in an enterprise makes Custom SQL Scripts an inadequate tactical solution
for data obfuscation, meant only for simple stand-alone applications.
Need of the hour
Given that sensitive data in an enterprise resides not in only in databases, but in files and message queues, the need-of-
the-hour is for a
non-repeatable data masking techniques, support for masking of sensitive data in multiple data sources like databases, files,
message queues and support for heterogeneous platforms and technologies is a must.
The solution must ensure that referential integrity of data is preserved across data sources in order to facilitate an integrated
environment. The solution must also integrate with upstream and downstream applications to facilitate an end-to-end testing
A typical enterprise has data distributed across legacy, modern and open source databases. Thus support for obfuscation of
data residing in databases like Oracle, DB2, SQL Server, Sybase, MySQL etc. would form a minimum criterion for selection
of a data privacy solution. In addition, these databases reside on Windows, Unix, Linux and Mainframes. To add to the
complexity, fixed format files, variable format files and XML files store sensitive data. The data privacy solution must support
the above data sources, technology and platforms.
Choice of Data Masking techniques
The Employee names in the Employee table may need to be shuffled, while the credit card numbers must have only their last
four digits visible with the rest of the twelve digits being masked with asterisks (*). The Patient code need to be hashed. The
data privacy solution must provide a choice of masking technique.
Preservation of Data Integrity
must be replaced with
must be replaced with
must be consistent across databases
as well as files and message queues in order to preserve data integrity. Masking of a primary key should result in foreign
key also being masked with the same value in order to preserve referential integrity. The data privacy solution must enable
preservation of referential integrity within and across data sources.