Professional Documents
Culture Documents
Day 1 of 4
<Presenter Name>
<Date>
54% 44%
Data-driven increase in
revenue performance
faster time
to market
transformations yield
significant benefits 62%
improvement in
54%
increased
customer satisfaction profit results
Is it trustworthy?
DISCOVERY COMPLIANC E
What data do I have? What’s my exposure to risk?
Where did the data originate? Is my usage compliant?
Can I trust it? How do I control access & use?
What is required by regulation X?
Azure
Synapse
Analytics
Microsoft Purview
On-prem
Data Producers and Consumers Data Officers SQL Server
SaaS
Applications
Generally
Microsoft Purview Data Map In Preview
Available
Data Classification
Business Glossary
Assets report
Glossary report
Export asset list into CSV file for offline tracking with data owners
Author and enforce data access policies for subscriptions, resource groups, Azure
Blob Storage, Azure Data Lake (Gen2) and SQL DevOps roles
In-place sharing for Azure Blob Storage and Azure Data Lake (Gen2)
• Microsoft Purview is designed to address the issues mentioned in the previous sections and to help enterprises
get the most value from their existing information assets
• Microsoft Purview provides a cloud-based service into which you can register data sources
• Discovering and understanding data sources and their use is the primary purpose of registering the sources
• Users can contribute to the catalog by tagging, documenting, and annotating data sources that have already
been registered
Loading Data in the Data Map
• Sourcing data
• Mapping data
• Scanning data
• Classification
A credential object can be created for various types of authentication scenarios, such as Basic
Authentication requiring username/password.
Credential capture specific information required to authenticate, based on the chosen type of authentication
method
Credentials use your existing Azure Key Vaults secrets for retrieving sensitive authentication information during
the Credential creation process.
Credential types supported in Microsoft Purview
These credential types are supported in Purview
Service Principal: You add the service principal key as a secret in key vault
Account Key: You add the account key as a secret in key vault
Role ARN: For an Amazon S3 data source, add your role ARN in AWS
Using Purview managed identity to set up scans
You can add the Microsoft Purview system-assigned managed identity to have access to scan below
different data sources
• A scan rule set is a container for grouping a set of scan rules together so that you can easily associate them with
a scan
• For example, you might create a default scan rule set for each of your data source types, and then use these scan
rule sets by default for all scans within your company
• You might also want users with the right permissions to create other scan rule sets with different configurations
based on business need.
Store your credential in your Azure Key Vault instance and use the right secret name and version
• A resource set is a single object in the catalog that represents many assets in storage
• For instance, A Spark cluster has persisted a DataFrame into an Azure Data Lake Storage (ADLS) Gen2 data
source. In Spark the table looks like a single logical resource, on the disk there are likely thousands of Parquet
files, each of which represents a partition of the total DataFrame's contents.
How Microsoft Purview scans Resource Sets ?
When Microsoft Purview detects resources that it thinks are part of a resource set, it switches from a full scan
to a sample scan
For each file it does open, it uses its schema and runs its classifiers
When Microsoft Purview matches a group of assets into a resource set, it attempts to extract the most useful
information to use as a display name in the catalog. Some examples of the default naming convention applied
Example 1
• Qualified name: https://myblob.blob.core.windows.net/sample-data/name-of-spark-output/{SparkPartitions}
• Display name: "name of spark output"
Example 2
• Qualified name: https://myblob.blob.core.windows.net/my-partitioned-data/{Year}-{Month}-{Day}/{N}-{N}-{N}-{N}/{GUID}
• Display name: "my partitioned data"
Example 3
• Qualified name: https://myblob.blob.core.windows.net/sample-data/data{N}.csv
• Display name: "data"
Additional features of Search