You are on page 1of 10

This feature lets you hide part of the data from normal users.

For example, an application may


display a Social Security Number (SSN) and mask out all but the last four digits of the number.
Instead of displaying 123-45-6789, it would display XXX-XX-6789. If a privileged user is
viewing the information, the masking is not applied. This is not encryption. The data is not
masked on the storage device; it is masked or obfuscated on output based on whether the caller is
normal or privileged. This is not a total security solution, but it can be deployed as part of your
security strategy.

Dynamic data masking is useful in a couple of scenarios. The first is probably obvious: for
regulatory compliance. Whether the regulation is governmental or dictated by an industrial body,
protecting the display of sensitive data is often demanded or considered a best practice. Another
scenario where this feature could be leveraged is internally with developers. It is not uncommon
for developers to request real data for testing and development purposes. Often, when supplying
data for this situation, you need to scrub sensitive data, whether through scrambling or actual
removal. Scrubbed data can affect the effectiveness of testing by skewing query results. By
applying dynamic data masking in this scenario, the data can be protected while providing a
more realistic test bed. Often, static data masking, where data is stored in a masked format
instead of only being masked upon output, is used in this situation, but this method introduces
other problems, such as breaking data integrity.

In this article, I'll provide an example of how to set up dynamic data masking in SQL Server
2016 CTP2 and in Azure SQL database V12. An appealing characteristic of dynamic data testing
is that it can be applied by adding to a table's definition without having to rewrite application
code.

Setting up dynamic data masking

This article is based on the publicly available SQL Server 2016 CTP 2.2 release. The first step in
dynamic data masking is to create a database like the one shown in Figure 1.

Create a table for testing

Fig
ure 1. An example database for testing data masking.
Insert a few rows for testing

Fig
ure 2. Add rows of data.

Verify that the data is there

Fig
ure 3. Check to make sure the data was inputted.

Because no masking rules are defined, all data is visible.

Create a test user and test permissions to the table


Fig
ure 4. The employee table without masking.

Because no masking has been defined, the TestUser login sees everything.

Masking is part of the table definition in SQL Server 2016

Fig
ure 5. The employee table with a mask for HomePhone.

The Employee table definition was modified to add a customized mask to the HomePhone
column (Figure 5). The function used in this case is the partial function. The partial(0,"XXX-
XXX-",4) means to grab zero leftmost characters from the column to start with, use the string
defined, and then display the last four characters of the column. There is nothing inherently
meaningful about the Xs or the hyphens. The mask string was crafted to be similar to the data
already in that column. Let's see what happens when the WorkPhone column is masked with a
slightly different string (Figure 6).
Fig
ure 6. The employee table with masks for HomePhone and WorkPhone.

The definition of the mask for WorkPhone is similar to the definition of the mask for
HomePhone except for the characters in the string within the partial function.

Now that it is shown that there are mask definitions on each of the phone columns, can a query
be run against the table referencing a masked column as the search criteria? Let's find out.

Fig
ure 7. Search query on masked data.

Since the mask is applied on display of the data, the ability to search on a masked column is not
affected.

Let's add a mask to SSN.

Fig
ure 8. Employee table with masks for HomePhone, WorkPhone and SSN

There are also a few predefined functions to add a mask to a column. For example, there is a
predefined mask for email. Let's take a look at it.
Fig
ure 9. Employee table with masks for HomePhone, WorkPhone, SSN and workemail.

Notice that the predefined mask for email address displays the first character of the column, but
the rest of the mask is XXX@XXXX.com to obfuscate the entire domain name (Figure 9).

Does a mask cause a sort to not work properly? Let's find out.

Fig
ure 10. Sort query on employee table.

It is sorted properly, but the first character of the mask is presented (Figure 10). What if we took
that away and applied a custom mask to the email address?
Fig
ure 11. The employee table sorted with the custom mask.

As shown above, the data is still sorted properly. Also note that I dropped the mask from the
column and added a customized mask using the partial function. Again, the mask is being
deployed upon display.

At this point, I've shown that a masked column can be used in a where clause for searching for
specific records and that correct sorting is occurring. If a developer wanted to get at the
unmasked data, could this be done (Figure 12)?

Fig
ure 12. Unmasking the data.

I needed to provide additional permissions to TestUser on the dbo schema, but the SELECT
INTO copied the unmasked data into another table. The unmasked data is now in that table.

Dynamic data masking in Azure SQL database v12 (preview)


In Azure SQL database v12, the portal is used to set masking for columns in a GUI. Here, I'll use
the Azure preview portal to demonstrate dynamic data masking setup. I assume you know how to
create and access an Azure SQL database from SQL Server Management Studio (SSMS). Note
that if you are using a "Downlevel Client" (TDS 7.3 and below) for accessing the data, you will
want to use a modified connection string from servername.database.windows.net to
servername.secure.windows.net.

The first step is to create a table in the Azure SQL database using the same syntax used in the
SQL Server 2016 example (Figure 1). The syntax is shown again in Figure 13.

Fig
ure 13. Syntax for creating the employee table.

I'll also insert the same data used in Figure 1 into the Azure SQL Database table. Use the same
INSERT statements from Figure 2 and check to see that the data is correct, as shown in Figure
14.

Fig
ure 14. Verifying data in Azure SQL Database.

Defining a mask

From the Azure preview portal, select Browse ALL / SQL databases / <databasename> /
Dynamic Data Masking (preview). This will display a summary of any masking rules already
defined. From this point, select Add Mask. Select the HomePhone choice in the Column
dropdown. Choose Custom String from the Masking Field Format dropdown. Use 0 for Exposed
Prefix, XXX-XXX- for the Padding String, and 4 for the Exposed Suffix. Click Save on the Add
Masking Rule blade and then Save again on the Dynamic Data Masking (preview) blade. The
portal will display a message to indicate the changes are saved and ask you to acknowledge by
clicking OK.

Rick Heiges asks:

Which of the two methods of implementing dynamic


data masking do you plan to use?
Join the Discussion

Fig
ure 15. Create and add the HomePhone masking rule in Azure SQL Database.

Let's look at the employee table to see if it worked. Note that it might take a minute or two for
the masking rule to be applied fully.
Fig
ure 16. The employee table with the masking rule for HomePhone applied.

The masking worked in the same manner in Azure SQL database as it did in SQL Server 2016
CTP2. At this point, it is clear that other masking rules can be defined as I did earlier but through
the GUI. Currently, it appears that the GUI in the portal must be used to define masking rules
because when I used the DDL to add a mask, the syntax appeared to be accepted, but the output
was unchanged.

Some common masking rules are built in to Azure SQL database. One of these is for Social
Security Number. Let's add it to the employee table's SSN column, as shown below.

Fig
ure 17. Use the built-in masking rule for social security numbers.

The results are below in Figure 18.

Fig
ure 18. The employee table with the social security number masking rule applied.
Although the data is masked, it is not necessarily in the format that may have been desired. This
mask would work fine for a SSN column that was wider and had the dashes included in the data
instead of inferred. Figure 19 shows the masking rule for using the Number (random number
range) option for the EmpID for the employee table.

Figure 19. The


masking rule for Number (random number range) for the employee table.

Obviously, a random number mask for EmpID is not realistic, but it shows the built-in mask
feature.

Also, note that there is a field on the dynamic data masking blade for privileged logins. These are
logins that do not apply masking rules when displaying the data. A login would need to be
created and another connection to the Azure SQL database established to demonstrate this. This
is somewhat different from how this feature was presented earlier in the article where a user
(without a login) could easily be created and impersonated for the demonstration.

You might also like