Databricks DBX CLI - Deploy The Spark JAR Using YAML - by Ganesh Chandrasekaran - Medium

2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
Open in app Sign up Sign In
You have 1 free member-only story left this month.

Sign up for Medium and get an extra one
Ganesh Chandrasekaran Follow
Apr 27, 2022 · 3 min read · · Listen
Save
Databricks dbx CLI: deploy the spark JAR using

YAML
dbx simplifying the job launch and deployment processes across multiple
environments. It also helps to use your favorite IDE for developing your application.
Assumption: You are familiar with developing Spark applications using IDE.
You cannot run a single Scala or Java file as a job on a cluster as you can with a
single Python file. To run Scala or Java code, you must first build it into a JAR.
You cannot run a JAR as a job on an existing all-purpose cluster as you can with
Python. To run a JAR, you must use a job cluster instead.
You must first deploy the JAR to your Databricks workspace before you can run
that deployed JAR on a job cluster in that workspace.
Step 1: Install the Databricks CLI
Step 2: Install the Databricks dbx CLI 5 1
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 1/7
Step 3: dbx can be configured onto your existing project’s root folder.
Note 1: databricks profile names are case sensitive.
Example:
/read_events> dbx configure --profile=<databricks profile name>
Step 4: Create a new conf folder under the project’s root folder.
/read_events> mkdir conf
Conf — deployment.yaml file location
Step 5: Get the list of available spark_version for your Databricks workspace/shard
> databricks clusters spark-versions
This will return a JSON document with all possible Spark versions. If you have
installed ‘JQ’ library then use the following command
> databricks clusters spark-versions | jq '.versions[] | .key'
Step 6: Get the list of available node_type_ids
databricks clusters list-node-types

or
databricks clusters list-node-types | jq
'.node_types[]|.node_type_id'
Step 7: Create a new deployment.yaml file under conf folder
Sample contents to be copied in deployment.yaml file
Replace the value of spark_version and node_type_id based on your choice.
environments:
default:
strict_path_adjustment_policy: true
jobs:
- name: "your_job_name"
new_cluster:
cluster_name: "" # this should be blank
spark_version: "10.3.x-scala2.12" #replace this
node_type_id: "Standard_DS3_v2" #replace this
num_workers: 1
libraries:
- jar: "file://target/scala-2.12/your_file_name.jar"
spark_jar_task:
main_class_name: "com.myorg.myproject.your_object_name"
email_notifications:
on_start: ["user@email.com"]
on_success: ["user@email.com"]
on_failure: ["user@email.com"]
#http://www.quartz-scheduler.org/documentation/quartz-
2.3.0/tutorials/crontrigger.html
schedule:
quartz_cron_expression: "00 25 04 * * ?"
timezone_id: "UTC"
pause_status: "PAUSED" #SCHEDULED
tags:
createdby: "optional value"
createdon: "optional value"
permissions:
access_control_list:
- user_name: "owneremail@email.com"
permission_level: "IS_OWNER"
- group_name: "optional_your_group_name"
permission_level: "CAN_VIEW"
- user_name: "optional_another_user@email.com"
permission_level: "CAN_MANAGE"
- user_name: "optional_third_user@email.com"
permission_level: "CAN_MANAGE_RUN"
Note 2: This uses quartz cron format, which is slightly different from the traditional
Cron format.
Note 3: The Jar file location is relative to the project’s root folder.
JAR file location
Step 8: Run the dbx deploy to create a task in your workspace.
dbx deploy --environment=default --no-rebuild
-- no-rebuild : is important
Step 9: [Optional] Trigger the job manually via CLI.
dbx launch --environment=default --job=your_job_name
Read the dbx documentation for more details.
Join Medium with my referral link - Ganesh Chandrasekaran

As a Medium member, a portion of your membership fee goes to
writers you read, and you get full access to every story…
ganeshchandrasekaran.com
Get an email whenever Ganesh Chandrasekaran publishes.

Edit description
ganeshchandrasekaran.com
Dbx Cli Jar Databricks Ide
Enjoy the read? Reward the writer.Beta

Your tip will go to Ganesh Chandrasekaran through a third-party platform of their choice, letting them know you
appreciate their story.
Give a tip
Get an email whenever Ganesh Chandrasekaran publishes.

By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.
Subscribe
About Help Terms Privacy
Get the Medium app

Databricks DBX CLI - Deploy The Spark JAR Using YAML - by Ganesh Chandrasekaran - Medium

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Databricks DBX CLI - Deploy The Spark JAR Using YAML - by Ganesh Chandrasekaran - Medium

Uploaded by

Copyright:

Available Formats

2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

Open in app Sign up Sign In

You have 1 free member-only story left this month.

Ganesh Chandrasekaran Follow

Apr 27, 2022 · 3 min read · · Listen

Databricks dbx CLI: deploy the spark JAR using

Step 1: Install the Databricks CLI

Step 2: Install the Databricks dbx CLI 5 1

Note 1: databricks profile names are case sensitive.

/read_events> dbx configure --profile=<databricks profile name>

/read_events> mkdir conf

Conf — deployment.yaml file location

> databricks clusters spark-versions

> databricks clusters spark-versions | jq '.versions[] | .key'

Step 6: Get the list of available node_type_ids

databricks clusters list-node-types

Step 7: Create a new deployment.yaml file under conf folder

Sample contents to be copied in deployment.yaml file

Replace the value of spark_version and node_type_id based on your choice.

JAR file location

Step 8: Run the dbx deploy to create a task in your workspace.

dbx deploy --environment=default --no-rebuild

Step 9: [Optional] Trigger the job manually via CLI.

dbx launch --environment=default --job=your_job_name

Read the dbx documentation for more details.

Join Medium with my referral link - Ganesh Chandrasekaran

Get an email whenever Ganesh Chandrasekaran publishes.

Dbx Cli Jar Databricks Ide

Enjoy the read? Reward the writer.Beta

Get an email whenever Ganesh Chandrasekaran publishes.

About Help Terms Privacy

Get the Medium app

You might also like

2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium