You are on page 1of 14

Databricks Certified Data Engineer

Associate Exam Resource

QUICK REVISION
NOTES
By: Certification Champs

Visit here for full Udemy Course


AGENDA

03 Introduction 11 Views in Databricks

04 Delta Lake 12 Managed vs Unmanaged Tables

05 Databricks Architecture 13 Medallion Architecture

08 Magic commands

09 Clusters

10 Types of Clusters
Introduction
These notes are part of our Udemy Course. Visit
here to access our Course
Reading these notes twice is suggested 1-2 hours
before the exam
You may post your queries in the Q&A section of
the course, if you have any doubts.
Features of Delta Lake

ACID Unifies batch


Open
and streaming
compliant Source data
Stores data Time
in Parquet
travel
format
Databricks Architecture
Databricks Architecture consists of two parts

CONTROL Controlled by Customer

PLANE
Controlled by Databricks DATA
Visit here for full Udemy Course
PLANE
DATABRICKS CUSTOMER
CLOUD ACCOUNT CLOUD ACCOUNT
Web Application
Repositories and
Notebooks
Data
Job Scheduling

Cluster Management
ONLY DATA IS STORED IN CUSTOMER CLOUD
ACCOUNT AND IS CONTROLLED BY THE
CUSTOMER
WHILE REST OF THE THINGS LIKE CLUSTERS
AND NOTEBOOKS ARE STORED IN DATABRICKS
CLOUD ACCOUNT AND IS CONTROLLED BY
DATABRICKS
Visit here for full Udemy Course
Magic commands in Notebooks!
Language specific magic commands supported in Databricks

%python %scala %r %sql


Changing the default language of a notebook will result in addition of
%{previous_default_language} at the start of each existing cell of the notebook.
In other words, changing the default language of a notebook from Python to Scala will result in
addition of %python at the start of each existing cell of the notebook.

Some other magic commands include %md - Used for Markdown language and
%run - Used to run a notebook from another notebook
Visit here for full Udemy Course
Clusters In
Databricks
CLUSTERS CAN BE CREATED BY SELECTING
COMPUTE FROM THE LEFT SIDE NAVIGATION BAR

COMPUTE PAGE CAN ALSO BE USED TO UPDATE,


START, STOP OR TERMINATE A CLUSTER
Types of Clusters
All-purpose Cluster Job Cluster

An all-purpose cluster can be created using UI, A job cluster cannot be created manually. It is
CLI or REST API automatically created as you submit a job

It can be restarted or terminated as per the It is terminated automatically when the job
need ends and CANNOT be restatred

It can be shared between multiple users It CANNOT be shared between users



Used for running interactive jobs Used for running automated jobs

Visit here for full Udemy Course


Noteworthy things about
Views! Stored in temporary
Database named global_temp
Accessed by using
Persisted through multiple
global_temp.database_name
sessions

Tied to a
CREATE GLOBAL
CREATE VIEW session, CREATE TEMPORARY
VIEW view_name TEMPORARY VIEW
view_name CANNOT be view_name
accessed when
the session
ends

VIEWS DON'T HAVE PHYSICAL EXISTENCE


Visit here for full Udemy Course
Managed vs Unmanaged tables
Managed Table External/Unmanaged Table

Both data and metadata are controlled by Only metadata is controlled by Databricks.
Databricks Data is controlled by the user

If you drop a managed table, data is also If you drop an unmanaged(external) table, the
deleted data remains intact

The data for a managed table remains at While creating an external table, LOCATION
dbfs:/user/hive/warehouse/{db_name}.db/ keyword must be used

CREATE TABLE table_name (col1 INT, CREATE TABLE table_name (col1 INT,
col2 STRING) col2 STRING) LOCATION '/path/'
Medallion Architecture
ALSO KNOWN AS MULTI-HOP ARCHITECTURE

Bronze SILVER GOLD


JOINS AGGREGATIONS
REPORTING
Streaming
Data

Batch
Data DASHBOARDING

Raw Data Enriched data Aggregated data to


with timestamps be used for
from different columns converted to Reporting and
sources human readable form Dashboarding
Visit here for full Udemy Course
THANK YOU!

You might also like