Professional Documents
Culture Documents
Save
Assumption: You are familiar with developing Spark applications using IDE.
You cannot run a single Scala or Java file as a job on a cluster as you can with a
single Python file. To run Scala or Java code, you must first build it into a JAR.
You cannot run a JAR as a job on an existing all-purpose cluster as you can with
Python. To run a JAR, you must use a job cluster instead.
You must first deploy the JAR to your Databricks workspace before you can run
that deployed JAR on a job cluster in that workspace.
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 1/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
Step 3: dbx can be configured onto your existing project’s root folder.
Example:
Step 4: Create a new conf folder under the project’s root folder.
Step 5: Get the list of available spark_version for your Databricks workspace/shard
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 2/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
This will return a JSON document with all possible Spark versions. If you have
installed ‘JQ’ library then use the following command
environments:
default:
strict_path_adjustment_policy: true
jobs:
- name: "your_job_name"
new_cluster:
cluster_name: "" # this should be blank
spark_version: "10.3.x-scala2.12" #replace this
node_type_id: "Standard_DS3_v2" #replace this
num_workers: 1
libraries:
- jar: "file://target/scala-2.12/your_file_name.jar"
spark_jar_task:
main_class_name: "com.myorg.myproject.your_object_name"
email_notifications:
on_start: ["user@email.com"]
on_success: ["user@email.com"]
on_failure: ["user@email.com"]
#http://www.quartz-scheduler.org/documentation/quartz-
2.3.0/tutorials/crontrigger.html
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 3/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
schedule:
quartz_cron_expression: "00 25 04 * * ?"
timezone_id: "UTC"
pause_status: "PAUSED" #SCHEDULED
tags:
createdby: "optional value"
createdon: "optional value"
permissions:
access_control_list:
- user_name: "owneremail@email.com"
permission_level: "IS_OWNER"
- group_name: "optional_your_group_name"
permission_level: "CAN_VIEW"
- user_name: "optional_another_user@email.com"
permission_level: "CAN_MANAGE"
- user_name: "optional_third_user@email.com"
permission_level: "CAN_MANAGE_RUN"
Note 2: This uses quartz cron format, which is slightly different from the traditional
Cron format.
Note 3: The Jar file location is relative to the project’s root folder.
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 4/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 5/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
-- no-rebuild : is important
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 6/7
2/14/23, 2:08 PM Databricks dbx CLI: deploy the spark JAR using YAML | by Ganesh Chandrasekaran | Medium
Give a tip
Subscribe
https://ganeshchandrasekaran.com/databricks-dbx-cli-deploy-the-spark-jar-using-yaml-3c21d88c1115 7/7