Professional Documents
Culture Documents
Rest Api (Pyapacheatlas) : Prerequisites
Rest Api (Pyapacheatlas) : Prerequisites
🤔 Prerequisites
An Azure account with an active subscription.
An Azure Azure Purview account .
🔨 Tools
AzureDataStudio (Download and Install)
Python 3.8.10 (Download and Install)
📢 Introduction
While Purview Studio is the default method of interfacing with Azure Purview, the
underlying platform can be accessed via a set of API's. This opens up the possibility
of a variety of scenarios including:
🎯 Objectives
Understand the high-level Apache Atlas concepts.
Generate an access token.
Read data from the Azure Purview platform.
Table of Contents
1. Apache Atlas
2. Register an Application
3. Generate a Client Secret
4. Provide Service Principal Access to Azure Purview
5. Get an Access Token
6. Read data from Azure Purview
1. Apache Atlas
🗺️ What is Apache Atlas?
"Apache Atlas provides open metadata management and governance capabilities for
organizations to build a catalog of their data assets, classify and govern these assets
and provide collaboration capabilities around these data assets for data scientists,
analysts and the data governance team."
Source: Apache.org
Azure Purview's data catalog is largely based on Apache Atlas, and therefore shares
much of the same surface area that allows users to programmatically perform CRUD
(CREATE/READ/UPDATE/DELETE) operations over Azure Purview assets.
Atlas Endpoints
As can be seen in the Apache Atlas Swagger, Atlas has a variety of REST endpoints
that handle different aspects of the catalog (e.g. types, entities, glossary, etc).
Types
name: Address
qualifiedName: mssql://sqlsvr.database.windows.net/sqldb/SalesLT/Address
status: ACTIVE
typeName: azure_sql_table
A hierarchical set of business terms that represents your business domain. For
example:
An object that defines the relationship between objects. For example: A relationship
of type AtlasGlossarySemanticAssignment describes the relationship between
an AtlasGlossaryTerm and an asset (e.g. azure_datalake_gen2_path).
JSON Code Snippet: AtlasGlossarySemanticAssignment (Relationship)
Lineage
Returns lineage information about an entity (e.g. Azure Data Factory Copy Activity).
Lineage details where data originated from, where it moved, and where it was
processed.
JSON Code Snippet: Azure Data Factory Copy Activity (Lineage)
Note: While Azure Purview is using Apache Atlas, there are certain areas such as
Discovery which is responsible for search, where Azure Purview has deviated and
implemented a custom search API.
2. Register an Application
To invoke the REST API, we must first register an application (i.e. service principal)
that will act as the identity that the Azure Purview platform reognizes and is
configured to trust.
Name purview-spn
o Application (client) ID
o Directory (tenant) ID
3. Generate a Client Secret
1. Navigate to Certifications & secrets and click New client secret.
Descriptio purview-api
Property Example Value
Expires In 1 year
3.
A client secret is a secret string that the application uses to prove its identity
when requesting a token, this can also can be referred to as an application
password.
4. Provide Service Principal Access to Azure Purview
1. Navigate to Purview Studio > Data
map > Collections > YOUR_ROOT_COLLECTION, and then click Add data
curators.
2. Search for the name of the Service Principal (e.g. purview-spn), select the
Service Principal from the search results, and then click OK.
5. Get an Access Token
🎓 Knowledge Check
1. The Azure Purview API is largely based on which open source project?
A ) Apache Maven
B ) Apache Spark
C ) Apache Atlas
2. The Azure Purview API only works with Python.
A ) True
B ) False
3. The Azure Purview API can be used to create custom lineage between data
processes and data assets.
A ) True
B ) False
🎉 Summary
In this module, you learned how to get started with the Azure Purview REST API. To
learn more about the Azure Purview REST API, check out the Azure Purview
documentation.