You are on page 1of 11

REST API [PyApacheAtlas]

🤔 Prerequisites
 An Azure account with an active subscription.
 An Azure Azure Purview account .

🔨 Tools
 AzureDataStudio (Download and Install)
 Python 3.8.10 (Download and Install)

📢 Introduction
While Purview Studio is the default method of interfacing with Azure Purview, the
underlying platform can be accessed via a set of API's. This opens up the possibility
of a variety of scenarios including:

 Working with Azure Purview assets programmatically (e.g. bulk


create/read/update/delete).
 Adding support for other data sources beyond those supported out of the
box.
 Extending the lineage functionality to other ETL processes.
 Embedding Azure Purview asset data within custom user experiences.
 Triggering Azure Purview scans to run off the back of a custom event.

The primary focus of this module is the catalog which is based on the open-


source Apache Atlas project. Read below for more details on Apache Atlas and how it
relates to Azure Purview.

🎯 Objectives
 Understand the high-level Apache Atlas concepts.
 Generate an access token.
 Read data from the Azure Purview platform.

Table of Contents
1. Apache Atlas
2. Register an Application
3. Generate a Client Secret
4. Provide Service Principal Access to Azure Purview
5. Get an Access Token
6. Read data from Azure Purview

1. Apache Atlas
🗺️ What is Apache Atlas?

"Apache Atlas provides open metadata management and governance capabilities for
organizations to build a catalog of their data assets, classify and govern these assets
and provide collaboration capabilities around these data assets for data scientists,
analysts and the data governance team."

Source: Apache.org
Azure Purview's data catalog is largely based on Apache Atlas, and therefore shares
much of the same surface area that allows users to programmatically perform CRUD
(CREATE/READ/UPDATE/DELETE) operations over Azure Purview assets.

Atlas Endpoints

As can be seen in the Apache Atlas Swagger, Atlas has a variety of REST endpoints
that handle different aspects of the catalog (e.g. types, entities, glossary, etc).
Types

A definition (or blueprint) as to how a particular type of metadata object can be


created. This is similar to the concept of a Class in object-oriented programming. For
example: The type definition for an azure_sql_table is of category ENTITY and
contains attributes such as guid, qualifiedName, description, etc.
JSON Code Snippet: Azure SQL Table (Type)
Entity

An instance of an entity type (e.g. azure_sql_table). For example: An instance of


an azure_sql_table contains the following example values:

 name: Address
 qualifiedName: mssql://sqlsvr.database.windows.net/sqldb/SalesLT/Address
 status: ACTIVE
 typeName: azure_sql_table

JSON Code Snippet: Azure SQL Table (Entity)


Glossary

A hierarchical set of business terms that represents your business domain. For
example:

 Term Name: Focus Time


 Term Definition: Uninterrupted time blocks of two hours or more with no
meetings.

JSON Code Snippet: Focus Time (Glossary Term)


Relationship

An object that defines the relationship between objects. For example: A relationship
of type AtlasGlossarySemanticAssignment describes the relationship between
an AtlasGlossaryTerm and an asset (e.g. azure_datalake_gen2_path).
JSON Code Snippet: AtlasGlossarySemanticAssignment (Relationship)
Lineage

Returns lineage information about an entity (e.g. Azure Data Factory Copy Activity).
Lineage details where data originated from, where it moved, and where it was
processed.
JSON Code Snippet: Azure Data Factory Copy Activity (Lineage)
Note: While Azure Purview is using Apache Atlas, there are certain areas such as
Discovery which is responsible for search, where Azure Purview has deviated and
implemented a custom search API.

2. Register an Application
To invoke the REST API, we must first register an application (i.e. service principal)
that will act as the identity that the Azure Purview platform reognizes and is
configured to trust.

💡 Did you know?

An Azure service principal is an identity created for use with applications, hosted


services, and automated tools to access Azure resources.

1. Sign in to the Azure portal, navigate to Azure Active Directory > App


registrations, and click New registration.
2. Provide the application a name, select an account type, and click Register.

Property Example Value

Name purview-spn

Accounts in this organizational directory only - Single


Account Type
tenant
Redirect URI
Leave blank
(optional)
3.

4. Copy the following values for later use.

o Application (client) ID
o Directory (tenant) ID
3. Generate a Client Secret
1. Navigate to Certifications & secrets and click New client secret.

2. Provide a Description and set the expiration to In 1 year, click Add.


Property Example Value

Descriptio purview-api
Property Example Value

Expires In 1 year

3.

4. Copy the client secret value for later use.

💡 Did you know?

A client secret is a secret string that the application uses to prove its identity
when requesting a token, this can also can be referred to as an application
password.
4. Provide Service Principal Access to Azure Purview
1. Navigate to Purview Studio > Data
map > Collections > YOUR_ROOT_COLLECTION, and then click Add data
curators.

2. Search for the name of the Service Principal (e.g. purview-spn), select the
Service Principal from the search results, and then click OK.
5. Get an Access Token

6. Read data from Azure Purview

🎓 Knowledge Check
1. The Azure Purview API is largely based on which open source project?

A ) Apache Maven
B ) Apache Spark
C ) Apache Atlas
2. The Azure Purview API only works with Python.

A ) True
B ) False

3. The Azure Purview API can be used to create custom lineage between data
processes and data assets.

A ) True
B ) False

🎉 Summary
In this module, you learned how to get started with the Azure Purview REST API. To
learn more about the Azure Purview REST API, check out the Azure Purview
documentation.

You might also like