You are on page 1of 23

Implementing Unified Data

Protection and Governance


with Microsoft Purview

Day 1 of 4

<Presenter Name>
<Date>
54% 44%
Data-driven increase in
revenue performance
faster time
to market

transformations yield
significant benefits 62%
improvement in
54%
increased
customer satisfaction profit results

Source: How to lead a data-driven digital transformation by Harvard Business Reveiw


Today’s
data realities
What data do I have?

Is it trustworthy?

Can people access the data needed


to make the right decisions?

How can I enable faster


business insights?

What’s my compliance exposure?


Data governance
is becoming increasingly interdisciplinary

DISCOVERY COMPLIANC E
What data do I have? What’s my exposure to risk?
Where did the data originate? Is my usage compliant?
Can I trust it? How do I control access & use?
What is required by regulation X?

Chief Data Officer


Reimagine
data governance
with Microsoft
Purview
Unified | Hybrid | Open
Generally Available

Unified Data Governance Preview

with Microsoft Purview

Azure
Synapse
Analytics
Microsoft Purview
On-prem
Data Producers and Consumers Data Officers SQL Server

Data Catalog Data Sharing Data Estate


Data Policy
Enable effortless data Share data within and Insights Govern access to data Power BI
discovery between organizations Assess data estate health
Cloud

Data Map Azure SQL


Automate and manage metadata at scale

SaaS
Applications
Generally
Microsoft Purview Data Map In Preview
Available

Automated scanning of hybrid sources

Multi Cloud Scanning for AWS S3

Data Classification

Apache Atlas API support

Microsoft Purview Data Catalog

Search and Browse

Business Glossary

Microsoft Purview Data Lineage

Microsoft Purview Data Estate Insights

Features Data Stewardship report

Assets report

Glossary report

Classification and Labelling Reports

Asset-level drill down by sensitivity


Trends, drill downs and ability to take action

Export asset list into CSV file for offline tracking with data owners

Microsoft Purview Data Policy

Author and enforce data access policies for subscriptions, resource groups, Azure
Blob Storage, Azure Data Lake (Gen2) and SQL DevOps roles

Microsoft Purview Data Sharing

In-place sharing for Azure Blob Storage and Azure Data Lake (Gen2)
• Microsoft Purview is designed to address the issues mentioned in the previous sections and to help enterprises
get the most value from their existing information assets

• Microsoft Purview provides a cloud-based service into which you can register data sources

• Discovering and understanding data sources and their use is the primary purpose of registering the sources

• Users can contribute to the catalog by tagging, documenting, and annotating data sources that have already
been registered
Loading Data in the Data Map
• Sourcing data
• Mapping data
• Scanning data
• Classification

Browse and Search Information


A credential is authentication information that Microsoft Purview can use to authenticate to your registered data
sources

A credential object can be created for various types of authentication scenarios, such as Basic
Authentication requiring username/password.

Credential capture specific information required to authenticate, based on the chosen type of authentication
method

Credentials use your existing Azure Key Vaults secrets for retrieving sensitive authentication information during
the Credential creation process.
Credential types supported in Microsoft Purview
These credential types are supported in Purview

Basic authentication: You add the password as a secret in key vault

Service Principal: You add the service principal key as a secret in key vault

SQL authentication: You add the password as a secret in key vault

Account Key: You add the account key as a secret in key vault

Role ARN: For an Amazon S3 data source, add your role ARN in AWS
Using Purview managed identity to set up scans

You can add the Microsoft Purview system-assigned managed identity to have access to scan below
different data sources

• Azure Blob Storage


• Azure Data Lake Storage Gen1
• Azure Data Lake Storage Gen2
• Azure SQL Database
• Azure SQL Database Managed Instance
• Azure Synapse Workspace
• Azure Synapse dedicated SQL pools
What is a Scan Rule?

• A scan rule set is a container for grouping a set of scan rules together so that you can easily associate them with
a scan

• For example, you might create a default scan rule set for each of your data source types, and then use these scan
rule sets by default for all scans within your company

• You might also want users with the right permissions to create other scan rule sets with different configurations
based on business need.
Store your credential in your Azure Key Vault instance and use the right secret name and version

Verify this by following the steps below


• Navigate to your Key Vault
• Select Settings > Secrets
• Select the secret you're using to authenticate against your data source for scans
• Select the version that you intend to use and verify that the password or account key is correct by clicking
on Show Secret Value
• Navigate to the key vault and to the Access
policies section

• Verify that Purview managed identity shows under


the Current access policies section with at
least Get and List permissions on Secrets
• At-scale data processing systems typically store a single table in storage as multiple files. In the Microsoft
Purview data catalog, this concept is represented by using resource sets

• A resource set is a single object in the catalog that represents many assets in storage

• For instance, A Spark cluster has persisted a DataFrame into an Azure Data Lake Storage (ADLS) Gen2 data
source. In Spark the table looks like a single logical resource, on the disk there are likely thousands of Parquet
files, each of which represents a partition of the total DataFrame's contents.
How Microsoft Purview scans Resource Sets ?

When Microsoft Purview detects resources that it thinks are part of a resource set, it switches from a full scan
to a sample scan

For each file it does open, it uses its schema and runs its classifiers
When Microsoft Purview matches a group of assets into a resource set, it attempts to extract the most useful
information to use as a display name in the catalog. Some examples of the default naming convention applied

Example 1
• Qualified name: https://myblob.blob.core.windows.net/sample-data/name-of-spark-output/{SparkPartitions}
• Display name: "name of spark output"

Example 2
• Qualified name: https://myblob.blob.core.windows.net/my-partitioned-data/{Year}-{Month}-{Day}/{N}-{N}-{N}-{N}/{GUID}
• Display name: "my partitioned data"

Example 3
• Qualified name: https://myblob.blob.core.windows.net/sample-data/data{N}.csv
• Display name: "data"
Additional features of Search

Bulk edit search results


If you're looking to make changes to multiple assets returned by search, Microsoft Purview
lets you modify glossary terms, classifications, and contacts in bulk.

Browse the data catalog


The Microsoft Purview data catalog offers a browse experience that enables users to explore what
data is available to them either by collection or through traversing the hierarchy of each data source
in the catalog

Search query syntax


Potential keywords can be a classification, glossary term, asset description, or an asset name.
How to access the lab environment
Sign-in using the Microsoft Account

• Go to tf.labsonline.it (Use a HTML5


Sign-in using the Microsoft Account you • Click the Register for a Lab button
compliant browser such as Edge, Chrome,
used to register for this event.
Firefox, or other) • Enter the key <Lab Key> to register
• Click Sign-in with Microsoft button and start executing the labs

© Copyright Microsoft Corporation. All rights reserved.


Thank You!

© Copyright Microsoft Corporation. All rights reserved.

You might also like