You are on page 1of 28

DBT – Data Build Tool

Developer’s Guide
Table of Contents:
Contents
DBT Cloud UI Access.....................................................................................................................................................4
GIT Repository Setup....................................................................................................................................................4
1. New Repository creation..................................................................................................................................4
1.1. Service API....................................................................................................................................................5
1.2. GITHUB Cisco Portal.....................................................................................................................................5
2. SSH Key Setup...................................................................................................................................................5
DBT Cloud UI Setup......................................................................................................................................................6
1. Access to DBT Projects.....................................................................................................................................6
2. Configure Snowflake for DBT Projects..............................................................................................................7
3. DBT Environment Variables Setup....................................................................................................................7
4. dbt_project.yml setup......................................................................................................................................8
5. packages.yml setup..........................................................................................................................................9
6. sources.yml setup...........................................................................................................................................10
Develop DBT Models..................................................................................................................................................11
1. {{config ()}} Jinja function...............................................................................................................................11
2. Sources...........................................................................................................................................................12
2.1. Sources from .yml file.................................................................................................................................12
2.2. ref () function..............................................................................................................................................13
2.3. this..............................................................................................................................................................13
3. Multi-Column unique key in Incremental Models..........................................................................................14
4. Usage of dummy updates in pre/post hooks..................................................................................................15
5. Common Table Expression (CTE)....................................................................................................................15
6. Tags................................................................................................................................................................16
Macros using jinja......................................................................................................................................................17
Control Table Setup....................................................................................................................................................19
DBT Statements..................................................................................................................................................19
Type Cast Column types.............................................................................................................................................20
COMMIT and CHECK IN Code to GIT..........................................................................................................................21
DBT Docs....................................................................................................................................................................22
DBT Jobs.....................................................................................................................................................................23
CONTROL M Setup..................................................................................................................................................... 24
1. Steps to Install Python3.7...............................................................................................................................24
2. Steps to Activate Python3.7 in virtual environment.......................................................................................25
3. Command to execute DBT job from Control-M..............................................................................................25
4. Entry of DBT job information in “_JOBS” table...............................................................................................26
References................................................................................................................................................................. 28

DBT Cloud UI Access


To get DBT cloud IdP-initiated SSO access, user must connect through VPN and request to
be added as a member in below Cisco group.
 dbt_edw_dev_all_users

You need to be added as a member of the project group


For ex:
In order to get access, please send an email requesting access and a brief justification to the
mailer below.
 dbt-platform-support
 Please add Alex Garbarini (algarbar@cisco.com) in CC.

GIT Repository Setup

1. New Repository creation


GIT Repository setup will be created by DBT Platform team when the application
is on boarded. It can be created in two ways
- Service API call
- GITHUB Cisco portal

1.1. Service API


Creates new Repo by passing required parameters to the POST API Call.
Link for Post API Call: https://dbt-services.cisco.com/docs#/
1.2. GITHUB Cisco Portal
The repository needs to create by owner of the application or Contact
“dbt-platform-support” Team.
GitHub creation URL - https://www-github.cisco.com/new

Note: These setup screenshots referring to PE-EDW are applicable to CDAO

2. SSH Key Setup


Created as part of onboarding the project. The key needs to be updated in DBT &
GIT Environments.
To get deploy key describe the URL:
https://cloud.cisco.getdbt.com/#/accounts/1/settings/projects/ and click on the
repository.

Ex: Projects of Asset View

Ex: Below is the screenshot for Deploy key of Asset View project with project number 25
and repository id 34
For Ex: Deploy Key set up for Asset View project in GIT Repository

Note: The Deploy keys should match in DBT UI and GIT. This key is used for all merging
operations.

DBT Cloud UI Setup


Developer can login to DBT Cloud UI through Cisco Single Sign-On (SSO).
URL for Cloud UI: https://cloud.cisco.getdbt.com/enterprise-login/cisco/

1. Access to DBT Projects


Developer needs to get access to their required DBT project they would only be
allowed to operate.
Please check on boarding process for getting the required access: DBT Service APIs
Onboarding (sharepoint.com).
Once the access is provided, user can see projects in Cloud UI.
For Ex: Dev, Stage and Prod projects of Asset view
2. Configure Snowflake for DBT Projects

For each project, developer needs to configure snowflake database connections and
connect to Snowflake account using SSO on Profile tab.

Ex: Snowflake database details in dev environment

Ex: Connect to Snowflake account using Single Sign-On

3. DBT Environment Variables Setup


DBT Environment variables can configured in Environments section. These
variables can be referenced in the actual code.
DBT Admin team adds a generic username and read-only GitHub PAT to each dbt
project that is onboarded. Please reach out to DBT support team for getting these
variables added to the project.

4. dbt_project.yml setup
The .yml file has all available configurations that are required for a dbt project.
We can define environmental variables in dbt_project.yml that will be referenced as
global variable invoked by any model.

For Ex: Defining Global environmental variable to use TS3 environment. This variable
can be referenced by all models under a dbt project.
Ex: To Configure models globally as materialized = table and transient = false in
dbt_project.yml. These settings will be default, for all the models under the dbt project.
These configurations can be overridden in the model using config () Jinja macro.

5. packages.yml setup
DBT Platform teams maintains all the source tables (SS, BR and BR_VIEWS) that
are available in snowflake. Use the package below to download source tables. The
source tables will be downloaded to dbt_packages folder.

Ex: Download EDW_SOURCES from GIT location


To test the uniqueness of the columns in a model, we can download dbt_utils
package in packages.yml. These tests will be added in schema.yml file.

Ex: To test uniqueness for combination of columns in schema.yml

6. sources.yml setup
Source tables required for a dbt project can be

- Referring from GIT location maintained by DBT support team (Refer:


packages.yml section) or
- Maintained as a local .yml files

In the models, source tables are referenced using {{source ()}} jinja function.
Ex 1: Reference Database sources.yml that are maintained as local copy

Ex 2: Work Interim tables that need to be sourced in the models and not part of any dbt
model creation can be added to sources.yml files.

Develop DBT Models


Using DBT IDE, users can develop, compile, execute and debug dbt models on the web.
The model is created in the model's directory and is defined as .sql file. These .sql files run
against the snowflake database directly.
The model is a select statement, when executed loads the data with same target table name.
1. {{config ()}} Jinja function
config () function allows to override default property setup in dbt_project.yml
file.

Ex 1: Below is the config () setup to change the model’s materialized property from
table to incremental, WI schema to W schema.

Ex 2: Below is the config () setup to configure database and schema details.

Note: The configurations that are setup in the model will apply to that model only.

2. Sources
In a model, we can source tables from
- Actual sources from .yml file.
- ref () function
- this

2.1. Sources from .yml file


Ex: Sourcing the tables from .yml files in the select query.

The lineage graph for above sources


2.2. ref () function
ref () function is used to reference one model within another model.
Based on the usage of this function dbt creates a lineage in the proper order to
run all the models.
Using references between the models will automatically build the dependency
graph.

Ex: Using ref () function in the model

Lineage graph for above reference models

2.3. this
this is a relation, representation of the current model. It is same as ref
(‘current model’). this can avoid circular dependencies in the model.

Ex 1: Usage of {{this }} in hooks.


Ex 2: Usage of {{this}} in select query of a model.

Ex 3: Using {{this}} in a merge DML operation in post hook

3. Multi-Column unique key in Incremental Models


In incremental materialization, if the target table does exist, materialization executes
a MERGE statement
Merge using multi column can only be implemented by selecting
development(HEAD) DBT version in Dev Environment settings

Note: check if this is enabled in 1.1 version

Last three minor versions – choosing the version

1.1, 1.2, 1.3,2.1 -- choose 2.1 (latest major version)


Below screen shows the selection of HEAD version of Develop Environments.

Ex: Configuring multi-column uniqueness in incremental materialization

Merge_update_columns configuration takes list a column that needs to be


updated as part of updates

For more information on Merges in Incremental materialization, please check dbt


docs link Merges | dbt Docs (getdbt.com)

4. Usage of dummy updates in pre/post hooks


Sometimes even after using ref () and source tables properly in the models, the
lineage of a project will not include all the dependency models.

In such scenarios, dummy updates of the dependency models can be performed in


pre/post hook of the model where the dependency needs to be taken care.

Ex: Dummy updates of a dependency model.


5. Common Table Expression (CTE)
CTE is a temporary data set that can be used in the select query of a model. CTE
helps to simplify the complexity of the queries.

Since this is temporary data set, we cannot use time travel on the CTE’s.

Ex: Defining a CTE with temporary data set and usage of that CTE in select query of a
model

6. Tags
Tags can be defined in dbt_project.yml as well as model config. Tags will be used to run
the model which has been tagged under it.
If we want to run the dependent data lineage models defined under one Tag, we use the
command dbt run --select tag:my_tag

Ex: Run models under one tag


Macros using jinja
Macros are like functions as in other programming languages. In DBT, Macro’s are built in Jinja.
The same code snippet can be referenced across multiple models using macros.
Macros are defined as .sql files.

Below are few examples of Macro’s been used


Ex 1: Macro for a group by function.

Reference of the above macro in a model, ‘1’ represents the number of columns you
want to do a group by on.

Ex 2: Macro edw_recomp_br_view.sql for BR_VIEW refresh procedure call

Existing Informatica code:


CALL $$EDW_ADMIN_BR_DB.$$BR.EDW_RECOMP_BR_VIEW_WITH_MODE ('$
$EDW_SERVICE_BR_DB','$$EDW_BMD','MT_TABLE','PRE');

DBT Macro:

Call procedure takes the input parameter of (database name, schema, table name and
stage(pre/post)

The above macro can be referenced in the model as a per/post hook as shown below.

Ex 3: Macro generate_schema_name.sql for Schema override.


Macro for overriding the default schema name used in the Snowflake Configuration
details of DBT projects.

The default Schema used is ‘WI’. If you want to use a different schema like ‘W’ schema
within the same database. The Custom Schema macro should be overwritten with the
below code, as we do not want to the new schema to be ‘WI_W’ schema instead of just
‘W’ schema

Custom code:
Modified Code:

Setting the schema name in the config of the Model, will override the default
schema of the DBT projects.

Control Table Setup


As part of DBT, there is no centralized table like JCT. Each application team can
come up with their own Control table.
As part of existing Informatica setup we have various methods that we use to capture
the last extract date/Id for incremental processing, few methods we usually follow are given
below

 Informatica ETL code writing to the Job Control Table


DW_JOB_STREAMS.LAST_EXTRACT_DATE/LAST_EXTRACT_ID columns.
 Informatica ETL capturing the current timestamp and writing it to the parameter files
PDOPARM/PDOPARM_INCR.parm
DBT Statements
Usage of call statements to read the data from the control table will execute the
sql query in Snowflake and returns the results to the Jinja context.

The returned results which is a Matrix can be read and assigned to a parameter as
shown below.

T
he below code shows how to refer the parameter in a model

Update the Control table for next run in a post hook in the same model or a pre hook of
any other subsequent models.

Type Cast Column types


For Incremental models, dbt alters the columns data types of target tables based on the
source tables column types. To prevent altering the table, we can type cast the column types in
the model.

Please see below example to typecasting the column datatypes in the model select query
DBT team provided other solution for incremental models with optional data type
changes. Here is a video that walks you through the workflow:
https://www.loom.com/share/9f71e6ae396f488fb82a094247730bec

Code: https://www-github.cisco.com/CLOUD-ARCH/platform_reference/blob/master/macros/
frozen_dtypes_incremental.sql

COMMIT and CHECK IN Code to GIT

After you finish developing the models, user can commit and check in the code to GIT so
that your repository has the latest code.
Developers working parallelly on the same code can encounter merge commit conflict
issues. These conflicts can identify using the flags in the code. User needs to manually fix the
conflicts and commit these changes.

For Example:

Once the code is committed successfully, user can pull request the changes to the
branches like stage/prod/master.
These changes can be reviewed by the reviewer/owner of the code and then merge the
changes to stage/prod/master respectively.
DBT Docs

To generate documentation in the IDE, for the models in the project, execute the
command dbt docs generate.
It documents the Project models, sources.yml, databases, model-code, dependency,
referred by, node, columns, model compile sql

Ex: View docs will be enabled after the generating dbt docs. Please see below
screenshot.

Ex: Docs generated for a model


DBT Jobs

 Select hamburger icon on DBT IDE and click on Jobs to create a DBT job.
 Specify meaningful name for a job and select the environment for a project.
 Different Environment can link to different code branch and snowflake account.
 Add Commands to execute a model, multiple commands can be added for a job.

Environment setup:

 Select hamburger icon on DBT IDE and click on Environment to create and configure
environment for project.
 Multiple environments can be created for a project and by default environment points
to master branch.
 To point an environment to a branch other than master click on “CUSTOM BRANCH”
checkbox and specify a BRANCH created for a project.
CONTROL M Setup

1) Identify a Unix box and Sudo to build/place the DBT polling scripts in-order to invoke a
DBT job from Control-M
2) Place DBT Polling scripts on Unix box under the Sudo.
3) Python3.7 or higher version is required to successfully execute the scripts. In case of
lower version, please upgrade and activate Python3.7 in virtual environment.
4) requirements.txt depicts all the libraries required to successfully execute DBT polling
scripts.
5) Modify properties.ini and snowflake.ini files as per your induvial project details and
snowflake connections respectively.
6) queries.py list all the tables used to hold the DBT Job level details and tracks the DBT
execution log.
7) Please make sure to add an entry of DBT job in “_JOBS” tables to execute the job
successfully through Control-M
1. Steps to Install Python3.7
Login to Unix server and connect to sudo. Execute below commands one by one
in a sequence to install python3.7.9 on server. Replace “YOUR_SUDO_NAME” with the
sudo used for individual project.

 mkdir -p /users/YOUR_SUDO_NAME/data/software/python3.7
 cd /users/YOUR_SUDO_NAME/data/software/python3.7
 wget https://www.python.org/ftp/python/3.7.9/Python-3.7.9.tgz
 tar -zxvf Python-3.7.9.tgz
 cd Python-3.7.9
 ./configure --prefix /users/YOUR_SUDO_NAME/data/software/python3.7 --with-
ssl
 make
 make install
 /users/YOUR_SUDO_NAME/data/software/python3.7/bin/pip3 install --upgrade
pip
 /users/YOUR_SUDO_NAME/data/software/python3.7/bin/pip3 install --upgrade
pip-tools
 /users/YOUR_SUDO_NAME/data/software/python3.7/bin/pip3 install -r
requirements.txt

2. Steps to Activate Python3.7 in virtual environment


 /users/ YOUR_SUDO_NAME/data/software/python3.7/bin/python3.7 -m venv
venv-dbt
 source ./venv-dbt/bin/activate
Note: venv-dbt is a name of virtual environment for a project. Please name it
according to your project or convenience. Modify run_job_script.sh to change the
name of virtual environment you created during setup for your project.

3. Command to execute DBT job from Control-M


Command: -> cd /users/ YOUR_SUDO_NAME/DBT; ./run_job_script.sh -j
DBT_JOB_NAME -p 30

Command using variables: cd %%AAV_DBT_WORKDIR; %%AAV_DBT_SCRIPT -j%%


AAV_DBT_JOB_NAME -p 30

AAV_DBT_WORKDIR - cd /users/ YOUR_SUDO_NAME

AAV_DBT_SCRIPT - ./run_job_script.sh

DBT_JOB_NAME – JOB_CREATED_IN_DBT

Sample Control-M Job:


4. Entry of DBT job information in “_JOBS” table
The entire job level information required to enter in _JOBS table is available on
the URL when we navigate to the specific job. The below screenshot depicts the
information of Sample_Job in DBT.
References

DBT Service APIs Onboarding: DBT Service APIs Onboarding (sharepoint.com)

Cloud Data Transformations: CDAO DBT (Data Transformation) (sharepoint.com)

Rerunning the jobs from the point of failure: Airflow and dbt Cloud FAQs | dbt Docs
(getdbt.com)

Control M DBT jobs: https://app.vidcast.io/share/82644eb1-591a-4f2b-a453-fce2f53bb593


Meeting Recording for Control-M DBT:

Meeting URL: Webex Enterprise Site - Replay Recorded Meeting


Password: iGhk868U
DBT Polling scripts to call a DBT Job in Control-M: Browse Cloud-CGW / dbt-polling-example -
Bitbucket Engineering - SJC1 (cisco.com)

You might also like