Monitoring Databricks Jobs Through Calls To The Rest Api

2/14/23, 2:03 PM Monitoring Databricks jobs through calls to the REST API | by Georgia Deaconu | Towards Data Science
Open in app Sign up Sign In
Published in Towards Data Science
You have 2 free member-only stories left this month.

Sign up for Medium and get an extra one
Georgia Deaconu Follow
Oct 10, 2022 · 4 min read · · Listen
Save
Monitoring Databricks jobs through calls to the

REST API
Monitoring jobs that run in a Databricks production environment
requires not only setting up alerts in case of failure but also being
able to easily extract statistics about jobs running time, failure rate,
most frequent failure cause, and other user-defined KPIs.
The Databricks workspace provides through its UI a fairly easy and intuitive way of
visualizing the run history of individual jobs. The matrix view, for instance, allows
for a quick overview of recent failures and shows a rough comparison in terms of
run times between the different runs.
60 2
https://towardsdatascience.com/monitoring-databricks-jobs-through-calls-to-the-rest-api-4c02d7d27278 1/7
The job runs, matrix view (Image by the Author)
What about computing statistics about failure rates or comparing average run times
between different jobs? This is where things become less straightforward.
The job runs tab in the Workflows panel shows the list of all the jobs that have run
in the last 60 days in your Databricks workspace. But this list cannot be exported
directly from the UI, at least at the time of writing.
Job runs tab in the Workflow panel shows the list of jobs that run in the last 60 days in your workspace
(Image by the Author)
Luckily, the same information (and some extra details) can be extracted through
calls to the Databricks jobs list API. The data is retrieved in JSON format and can
easily be transformed into a DataFrame, from which statistics and comparisons can
be derived.
In this post, I will show how to connect to the Databricks REST API from a Jupiter
Notebook running in your Databricks workspace, extract the desired information,
and perform some basic monitoring and analysis.
1. Generate a Databricks Personal Access Token

To connect to the Databricks API you will first need to authenticate, in the same way
are asked to do it when connecting through the UI. In my case, I will use a
Databricks personal access token generated through a call to the Databricks Token
API for authentication in order to avoid storing connection information in my
notebook.
First, we need to configure the call to the Token API, by providing the request URL,
the request body, and its headers. In the example below, I am using Databricks
secrets to extract the Tenant ID and build the API URL for a Databricks workspace
hosted by Microsoft Azure. The resource 2ff814a6–3304–4ab8–85cb-cd0e6f879c1d
represents the Azure programmatic ID for Databricks, while the Application ID and
Password are extracted again from the Databricks secrets.
1 TOKEN_API_REQUEST_BODY = {
2 'grant_type': 'client_credentials',
3 'resource': '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d',
4 'client_id': dbutils.secrets.get("ds-keyvault-secrets", "ApplicationID"),
5 'client_secret': dbutils.secrets.get("ds-keyvault-secrets","ApplicationPassword")
6 }
7
8 TOKEN_API_REQUEST_URL = 'https://login.microsoftonline.com/' + dbutils.secrets.get("ds-keyvault
9
10 TOKEN_API_REQUEST_HEADERS = {'Content-Type': 'application/x-www-form-urlencoded'}
databricks-token-api hosted with ❤ by GitHub view raw
It is good practice to use Databricks secrets to store this type of sensitive

information and avoid entering credentials directly into a notebook. Otherwise, all
the calls to dbutils.secrets can be replaced with the explicit values in the code above.
After this setup, we can simply call the Token API using Python’s requests library
and generate the token.
1 import requests
2
3 response = requests.get(TOKEN_BASE_URL, headers=TOKEN_REQ_HEADERS, data=TOKEN_REQ_BODY)
4
5 if response.status_code == 200:
6 DBRKS_BEARER_TOKEN= response.json()['access_token']
databricks-request-token hosted with ❤ by GitHub view raw
2. Call the Databricks jobs API

Now that we have our personal access token, we can configure the call to the
Databricks jobs API. We need to provide the URL for the Databricks instance, the
targeted API (in this case jobs/runs/list to extract the list of jobs runs), and the API
version (2.1 is currently the most recent). We use the previously generated token as
the bearer token in the header for the API call.
1
2 dbk_host = 'https://adb-xxxxxxxxxxxxxxxxxxxxxxxxx.azuredatabricks.net'
3
4 api_version = '/api/2.1'
5 api_command = '/jobs/runs/list'
6
7 url = f"{dbk_host}{api_version}{api_command}"
8
9 DBRKS_REQ_HEADERS = {'Authorization': 'Bearer ' + DBRKS_BEARER_TOKEN }
10
11 response = requests.get(url = url, headers=DBRKS_REQ_HEADERS, params={'limit': 25, 'offset': 0}
databricks-jobs-runs hosted with ❤ by GitHub view raw
By default, the returned response is limited to a maximum of 25 runs, starting from

the provided offset. I created a loop to extract the full list based on the has_more
attribute of the returned response.
3. Extract and analyze the data

The list of jobs runs is returned as a list of JSON by the API call and I used Pandas
json_normalize to convert this list to a Pandas DataFrame. This operation converts
the data to the following format :
Job run information retrieved through the API call (Image by the Author)
To include task and cluster details in the response you can set the expand_tasks
parameter to True in the request params as stated in the API documentation.
Starting from this information we can perform some monitoring and analysis. I
used for instance the state.result_state information to compute the percentage of
failed runs in the last 60 days:
(Image by the Author)
Many useful statistics can be easily extracted, such as the number of failed jobs
each day across all scheduled Databricks jobs. We can have a quick overview of the
error messages logged by the clusters for the failed jobs by looking at the column
state.state_message.
Because we have access to each run’s start and end time we can compute the job run
time and easily visualize any trend and detect potential problems early on.
Job run time as a function of run date (Image by the Author)
Once we have access to this data in this easy-to-exploit format, the type of
monitoring KPIs that we want to compute can depend on the type of application.
The code computing these KPIs can be stored in a notebook that is scheduled to run
regularly and that sends out monitoring reports.
Conclusion
This post presents some examples of Databricks jobs monitoring that can be
implemented based on information extracted through the Databricks REST API.
This method can provide an overall view of all the jobs that are active in your
Databricks workspace in a format that can easily be used to perform investigations
or analysis.
Data Science Monitoring Databricks Mlops
Sign up for The Variable

By Towards Data Science
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-
edge research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.
Get this newsletter
About Help Terms Privacy
Get the Medium app

Monitoring Databricks Jobs Through Calls To The Rest Api

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Monitoring Databricks Jobs Through Calls To The Rest Api

Uploaded by

Copyright:

Available Formats

2/14/23, 2:03 PM Monitoring Databricks jobs through calls to the REST API | by Georgia Deaconu | Towards Data Science

Open in app Sign up Sign In

Published in Towards Data Science

You have 2 free member-only stories left this month.

Georgia Deaconu Follow

Oct 10, 2022 · 4 min read · · Listen

Monitoring Databricks jobs through calls to the

The job runs, matrix view (Image by the Author)

1. Generate a Databricks Personal Access Token

databricks-token-api hosted with ❤ by GitHub view raw

It is good practice to use Databricks secrets to store this type of sensitive

databricks-request-token hosted with ❤ by GitHub view raw

2. Call the Databricks jobs API

databricks-jobs-runs hosted with ❤ by GitHub view raw

By default, the returned response is limited to a maximum of 25 runs, starting from

3. Extract and analyze the data

(Image by the Author)

Job run time as a function of run date (Image by the Author)

Data Science Monitoring Databricks Mlops

Sign up for The Variable

Get this newsletter

About Help Terms Privacy

Get the Medium app

You might also like

2/14/23, 2:03 PM Monitoring Databricks jobs through calls to the REST API | by Georgia Deaconu | Towards Data Science