You are on page 1of 7

2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

Open in app Sign up Sign In

You have 1 free member-only story left this month.


Sign up for Medium and get an extra one

Ganesh Chandrasekaran Follow

Apr 27, 2022 · 3 min read · · Listen

Save

Databricks dbx CLI: deploy the spark JAR using


YAML
dbx simplifying the job launch and deployment processes across multiple
environments. It also helps to use your favorite IDE for developing your application.

Assumption: You are familiar with developing Spark applications using IDE.

You cannot run a single Scala or Java file as a job on a cluster as you can with a
single Python file. To run Scala or Java code, you must first build it into a JAR.

You cannot run a JAR as a job on an existing all-purpose cluster as you can with
Python. To run a JAR, you must use a job cluster instead.

You must first deploy the JAR to your Databricks workspace before you can run
that deployed JAR on a job cluster in that workspace.

Step 1: Install the Databricks CLI

Step 2: Install the Databricks dbx CLI 5 1

https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 1/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

Step 3: dbx can be configured onto your existing project’s root folder.

Note 1: databricks profile names are case sensitive.

Example:

/read_events> dbx configure --profile=<databricks profile name>

Step 4: Create a new conf folder under the project’s root folder.

/read_events> mkdir conf

Conf — deployment.yaml file location

Step 5: Get the list of available spark_version for your Databricks workspace/shard

https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 2/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

> databricks clusters spark-versions

This will return a JSON document with all possible Spark versions. If you have
installed ‘JQ’ library then use the following command

> databricks clusters spark-versions | jq '.versions[] | .key'

Step 6: Get the list of available node_type_ids

databricks clusters list-node-types


or
databricks clusters list-node-types | jq
'.node_types[]|.node_type_id'

Step 7: Create a new deployment.yaml file under conf folder

Sample contents to be copied in deployment.yaml file

Replace the value of spark_version and node_type_id based on your choice.

environments:
default:
strict_path_adjustment_policy: true
jobs:
- name: "your_job_name"
new_cluster:
cluster_name: "" # this should be blank
spark_version: "10.3.x-scala2.12" #replace this
node_type_id: "Standard_DS3_v2" #replace this
num_workers: 1
libraries:
- jar: "file://target/scala-2.12/your_file_name.jar"
spark_jar_task:
main_class_name: "com.myorg.myproject.your_object_name"
email_notifications:
on_start: ["user@email.com"]
on_success: ["user@email.com"]
on_failure: ["user@email.com"]
#http://www.quartz-scheduler.org/documentation/quartz-
2.3.0/tutorials/crontrigger.html
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 3/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

schedule:
quartz_cron_expression: "00 25 04 * * ?"
timezone_id: "UTC"
pause_status: "PAUSED" #SCHEDULED
tags:
createdby: "optional value"
createdon: "optional value"
permissions:
access_control_list:
- user_name: "owneremail@email.com"
permission_level: "IS_OWNER"
- group_name: "optional_your_group_name"
permission_level: "CAN_VIEW"
- user_name: "optional_another_user@email.com"
permission_level: "CAN_MANAGE"
- user_name: "optional_third_user@email.com"
permission_level: "CAN_MANAGE_RUN"

Note 2: This uses quartz cron format, which is slightly different from the traditional
Cron format.

Note 3: The Jar file location is relative to the project’s root folder.

https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 4/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

JAR file location

Step 8: Run the dbx deploy to create a task in your workspace.

https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 5/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

dbx deploy --environment=default --no-rebuild

-- no-rebuild : is important

Step 9: [Optional] Trigger the job manually via CLI.

dbx launch --environment=default --job=your_job_name

Read the dbx documentation for more details.

Join Medium with my referral link - Ganesh Chandrasekaran


As a Medium member, a portion of your membership fee goes to
writers you read, and you get full access to every story…
ganeshchandrasekaran.com

Get an email whenever Ganesh Chandrasekaran publishes.


Edit description
ganeshchandrasekaran.com

Dbx Cli Jar Databricks Ide

Enjoy the read? Reward the writer.Beta


Your tip will go to Ganesh Chandrasekaran through a third-party platform of their choice, letting them know you
appreciate their story.

https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 6/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium

Give a tip

Get an email whenever Ganesh Chandrasekaran publishes.


By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.

Subscribe

About Help Terms Privacy

Get the Medium app

https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 7/7

You might also like