You are on page 1of 98

Exam DP-203: Data Engineering on

Click to edit Master


Microsoft titleCrash
Azure style
Course

Tim Warner
Tim Warner
Click to edit Master title style

• Based in Nashville, TN, US


• Central time zone
• MCT, former MVP
• Twitter: @TechTrainerTim
• Badge:
TechTrainerTim.com
Agenda
Click to edit Master title style
• Introduction
• Design and implement data storage (15-20%)
• Develop data processing (40-45%)
• Secure, monitor, and optimize data storage and data
processing (30–35%)
Course materials
Click to edit Master title style

timw.info/dp203
Course Expectations
Click to edit Master title style
• We'll learn by doing – at least 80 percent demo
• Case study approach
• Please review the recordings…several times!
• I’m here to answer your questions – take advantage of
this
• Use the Q&A panel
Break Schedule (Central Time Zone)
Click to edit Master title style
• 07:00am - Start
• 08:00am - 7-minute break
• 09:00am - 7-minute break
• 10:00am - 7-minute break
• 11:00am - 7-minute break
• 12:00pm - Finish
Session Recordings
Click to edit Master title style
Session Recordings
Click to edit Master title style
Session Recordings
Click to edit Master title style
Mobile Browser: learning.oreilly.com
Click to edit Master title style
O'Reilly Mobile App
Click to edit Master title style
Exam DP-203 Latest OD Changes
Click to edit Master title style
What is an Azure Data Engineer?
Click to edit Master title style
• Design and implement the management, monitoring,
security, and privacy of data using the full stack of data
services
• “Builds and tunes data pipelines”
• “Implements, monitors, and optimizes data platforms”
• “Has solid knowledge of SQL, Python, or Scala”
• The Azure Data Scientist consumes the data the
Engineer provides
Azure Data Engineer Associate
Click to edit Master title style

DP-203
DP-203
Data Engineering on Microsoft
Azure

Data Engineering on Microsoft Azure

1-year validity period


Azure Data Fundamentals
Click to edit Master title style

DP-900
Azure Data Scientist Associate
Click to edit Master title style

DP-100
Azure Data Analyst Associate
Click to edit Master title style
DA-100
Azure Cosmos DB Developer
Click to edit Master title style

DP-420
Tim'stoCertification
Click edit Master Study Model
title style
Thank you!
Click to edit Master title style

• Course materials: timw.info/dp203


• Twitter: @TechTrainerTim
• Pluralsight: timw.info/ps
• Web: timw.info
Click to edit Master title style

Data Fundamentals
Data Types
Click to edit Master title style

Structured Semi-structured Unstructured


Table
Data Workload Types
Click to edit Master title style
Online Transactional Processing (OLTP) Online Analytical Processing (OLAP)

Customer
CustomerID CustomerName CustomerPhone

Orders
OrderID CustomerID OrderDate

Data is periodically loaded,


aggregated and stored in a cube
Data is stored one transaction at a
time
Data Processing Types
Click to edit Master title style
Batch

Streaming
Data Processing
Click to edit Master title style

Data
processing
Functions Cognitive Services

Raw
Data

Databricks Other tools


Cleaned and
transformed data
ETL
Click to edit Master title style
Transform

Basic filtering and


transformations
Extract Load

Discard sensitive data

Azure Data Factory

Azure Stream Analytics


ELT
Click to edit Master title style
Load Transform

Complex
processing
Extract

Azure Data Factory

Azure Synapse
Data Analytics
Click to edit Master title style
On-premises data
SQL Server, Oracle, Data ingestion Data storage Data processing Data visualization
fileshares, SAP

Cloud data
Azure, AWS, GCP

SaaS data
Salesforce, Dynamics
Non-Binary Data Formats
Click to edit Master title style
• CSV
• Good for bandwidth-sensitive data loads
• JSON
• Clear, structured format with optional validation
Binary Data Formats
Click to edit Master title style
• Optimized for splitting across compute nodes
• Parquet, ORC: Columnar store
• Fast read performance (compression) for analytical
workloads
• Avro: Row-based store that includes JSON
• Schematized
• Optimized for write performance
Click to edit Master title style

Blob Storage and


Data Lake
Azure Blob Storage
Click to edit Master title style
Block blobs Page blobs Append blobs
Has a maximum size of 4.7TB Can hold up to 8TB of data The maximum size is just over
195GB
Best for storing large, discrete, Is organized as a collection of
binary objects that changes fixed sized-512 byte pages Is a block blob that is used to
infrequently optimize append operations
Used to implement virtual disk
Each individual block can store storage for virtual machines Each individual block can store
up to 100MB of data up to 4MB of data
A block blob can contain up to
50000 blocks
Blob Storage
Click to edit Master title style
ADLS Gen 2
Click to edit Master title style
A repository of data Organises data into Supports POSIX and It is compatible with
for your Modern Data directories for RBAC permissions Hadoop Distributed
Warehouse improved file access File System

Store
Azure Data Lake Storage
High performance data lake available in
all 54 Azure regions
Data Lake Storage Gen 2
Click to edit Master title style
Azure Data Lake Storage Gen 2
Click to edit Master title style
Access Tiers & Lifecycle Management
Click to edit Master title style
Click to edit Master title style

Azure SQL Products


Relational Database Tables
Click to edit Master title style
Customers
CustomerID CustomerName CustomerPhone Data is stored in a table
100 Muisto Linna XXX-XXX-XXXX
101 Noam Maoz XXX-XXX-XXXX
Table consists of rows and columns
102 Vanja Matkovic XXX-XXX-XXXX
103 Qamar Mounir XXX-XXX-XXXX
104 Zhenis Omar XXX-XXX-XXXX All rows have same # of columns
105 Claude Paulet XXX-XXX-XXXX
106 Alex Pettersen XXX-XXX-XXXX
Each column is defined by a datatype
107 Francis Ribeiro XXX-XXX-XXXX
ACID Principle
Click to edit Master title style
Normalization
Click to edit Master title style
Customers Orders
CustomerID CustomerName CustomerPhone OrderID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX AD100 Noam Maoz XXX-XXX-XXXX
101 Noam Maoz XXX-XXX-XXXX AD101 Noam Maoz XXX-XXX-XXXX
102 Vanja Matkovic XXX-XXX-XXXX AD102 Noam Maoz XXX-XXX-XXXX
103 Qamar Mounir XXX-XXX-XXXX AX103 Qamar Mounir XXX-XXX-XXXX
104 Zhenis Omar XXX-XXX-XXXX AS104 Qamar Mounir XXX-XXX-XXXX
105 Claude Paulet XXX-XXX-XXXX AR105 Claude Paulet XXX-XXX-XXXX
106 Alex Pettersen XXX-XXX-XXXX MK106 Muisto Linna XXX-XXX-XXXX

Data is normalized to:


Reduce storage Avoid data duplication Improve data quality
Table Relationships
Click to edit Master title style
Customers Orders
CustomerID CustomerName CustomerPhone OrderID CustomerID SalesPersonID
100 Muisto Linna XXX-XXX-XXXX AD100 101 200
101 Noam Maoz XXX-XXX-XXXX AD101 101 200
102 Vanja Matkovic XXX-XXX-XXXX AD102 101 200
103 Qamar Mounir XXX-XXX-XXXX AX103 103 201
104 Zhenis Omar XXX-XXX-XXXX AS104 103 201
105 Claude Paulet XXX-XXX-XXXX AR105 105 200
106 Alex Pettersen XXX-XXX-XXXX MK106 105 201

In a normalized database schema:


Primary Keys and Foreign keys are used to define No data duplication exists (other than key values in 3rd Data is retrieved by joining tables together
relationships Normal Form (3NF) in a query
SQL Statement Categories
Click to edit Master title style
DML DDL DCL
Data Manipulation Language Data Definition Language Data Control Language
Used to query and manipulate Used to define database Used to manage security
data objects permissions
SELECT, INSERT, UPDATE, CREATE, ALTER, DROP, GRANT, REVOKE, DENY
DELETE RENAME
Azure Synapse
Click to edit Master title style

PolyBase
Data Warehouse Star Schema
Click to edit Master title style
Data Warehouse Snowflake Schema
Click to edit Master title style
Azure Synapse
Click to edit Master title style
Azure Synapse SQL Pool (DW) Architecture
Click to edit Master title style
Synapse SQL Pool Types
Click to edit Master title style
Azure Synapse Table Distribution Modes
Click to edit Master title style

https://timw.info/0jl
Azure Synapse Table Distribution Modes
Click to edit Master title style

https://timw.info/0jl
Slowly Changing Dimensions (SCD)
Click to edit Master title style
Slowly Changing Dimensions (SCD)
Click to edit Master title style
Slowly Changing Dimensions (SCD)
Click to edit Master title style
Click to edit Master title style

Azure Databricks
Azure Databricks
Click to edit Master title style
Lambda Architecture
Click to edit Master title style
Lambda Architecture with Databricks
Click to edit Master title style
Kappa Architecture
Click to edit Master title style
Kappa Architecture with Databricks
Click to edit Master title style
Click to edit Master title style

Data Security
Network security
Click to edit Master title style
Securing your network from attacks and unauthorized access is an important
part of any architecture

Network security
Internet protection Firewalls DDoS protection
groups

Assess the resources that To provide inbound The Azure DDoS Protection Network Security Groups
are internet-facing, and to protection at the service protects your Azure allow you to filter network
only allow inbound and perimeter, there are applications by scrubbing traffic to and from Azure
outbound communication several choices: traffic at the Azure resources in an Azure
where necessary. Make • Azure Firewall network edge before it can virtual network. An NSG
sure you identify all • Azure Application impact your service’s can contain multiple
resources that are allowing Gateway availability inbound and outbound
inbound network traffic of security rules
• Azure Storage Firewall
any type
Identity and access
Click to edit Master title style
Authentication Azure Active Directory features
This is the process of establishing the
identity of a person or service looking to Single sign-on Apps & device Identity services
access a resource. Azure Active Directory Enables users to management Manage Business
is a cloud-based identity service that remember only one You can manage your to business (B2B)
provide this capability ID and one cloud and identity services
password to access on-premises apps and and Business-to-
multiple devices and Customer (B2C)
applications the access to your identity services
Authorization organizations resources
This is the process of establishing what
level of access an authenticated person
or service has. It specifies what data
they're allowed to access and what they
can do with it. Azure Active Directory
also provides this capability
Encryption
Click to edit Master title style
Encryption at rest Encryption on Azure
Data at rest is the data that has been
stored on a physical medium. This could Raw encryption Database encryption Encrypting secrets
be data stored on the disk of a server, Enables the Enables the encryption Azure Key Vault is a
data stored in a database, or data stored encryption of: of databases using: centralized cloud
in a storage account • Azure Storage • Transparent Data service for storing
Encryption your application
• V.M. Disks
secrets
• Disk Encryption
Encryption in transit
Data in transit is the data actively moving
from one location to another, such as
across the internet or through a private
network. Secure transfer can be handled
by several different layers
Encryption
Click to edit Master title style
Encryption at rest Encryption on Azure
Data at rest is the data that has been
stored on a physical medium. This could Raw encryption Database encryption Encrypting secrets
be data stored on the disk of a server, Enables the Enables the encryption Azure Key Vault is a
data stored in a database, or data stored encryption of: of databases using: centralized cloud
in a storage account • Azure Storage • Transparent Data service for storing
Encryption your application
• V.M. Disks
secrets
• Disk Encryption
Encryption in transit
Data in transit is the data actively moving
from one location to another, such as
across the internet or through a private
network. Secure transfer can be handled
by several different layers
Azure SQL Database Firewall Rules
Click to edit Master title style
Azure SQL Database DDM
Click to edit Master title style
Azure SQL Database Always Encrypted
Click to edit Master title style
Azure Data Factory
Click to edit Master title style
Power BI
Click to edit Master title style
What are data streams
Click to edit Master title style
Data streams: Data streams are used to:
In the context of analytics, data streams
are event data generated by sensors or Analyze data: Understand systems: Trigger actions:
other sources that can be analyzed by Continuously Understand component Trigger specific
another technology analyze data to or actions when
detect issues and system behavior under certain thresholds
understand or various conditions to are identified
Data stream processing approach: respond to them fuel further
There are two approaches. Reference enhancements
data is streaming data that can be of said system
collected over time and persisted in
storage as static data. In contrast,
streaming data have relatively low
storage requirements. And run
computations in sliding windows
Event processing
Click to edit Master title style
The process of consuming data streams, analyzing them, and deriving actionable insights
out of them is called Event Processing and has three distinct components:

Examples include sensors or processes that generate data continuously such as a


Event producer heart rate monitor or a highway toll lane sensor

An engine to consume event data streams and deriving insights from them.
Depending on the problem space, event processors either process one incoming
Event processor event at a time (such as a heart rate monitor) or process multiple events at a time
(such as a highway toll lane sensor)

An application which consumes the data and takes specific action based on the
Event consumer insights. Examples of event consumers include alert generation, dashboards, or even
sending data to another event processing engine
Processing events with Azure Stream
Click to edit Master title style
Analytics
Microsoft Azure Stream Analytics is an event processing engine. It enables the consumption
and analysis of high volumes of streaming data in real time

Source Ingestion Analytical engine Destination

Sensors Event Hubs Stream Analytics Query Azure Data Lake


Language
Systems IoT Hubs Cosmos DB
.NET SDK
Applications Azure Blob Store SQL Database
Blob Store
Power BI
Create an Event Hub
Click to edit Master title style
Create an event hub namespace Create an event hub

1. In the Azure portal, select NEW, type 1. After the deployment is complete, click the xx-name-eh event hub on the dashboard
Event Hubs, and then select Event Hubs
from the resulting search. Then select 2. Then, under Entities, select Event Hubs
Create 3. To create the event hub, select the + Event Hub button. Provide the name socialstudy-eh,
2. Provide a name for the event hub, and and then select Create
then create a resource group. Specify xx- 4. To grant access to the event hub, we need to create a shared access policy. Select the socialstudy-eh
name-eh and xx-name-rg respectively, event hub when it appears, and then, under Settings, select Shared access policies
XX- represent your initials to ensure
uniqueness of the Event Hub name and 5. Under Shared access policies, create a policy with MANAGE permissions by selecting + Add. Give the
Resource policy the name of xx-name-eh-sap, check MANAGE, and then select Create
Group name 6. Select your new policy after it has been created, and then select the copy button for the
3. Click the checkbox to Pin to the CONNECTION STRING – PRIMARY KEY entity
dashboard, then select the Create 7. Paste the CONNECTION STRING – PRIMARY KEY entity into Notepad, this is needed later in the
button exercise
8. Leave all windows open
Azure Stream Analytics workflow
Click to edit Master title style
Complex event processing of Stream Data in Azure

Input Complex Event Output


Adapter Processor Adapter
Start a Stream Analytics Job
Click to edit Master title style
Azure Data Factory components
Click to edit Master title style
Consumes Activity Pipeline
Data set
(e.g. hive, stored proc., (Schedule, monitor,
(e.g. table, file) Is a logical
Produces copy ) grouping of manage)

Represents a data
item(s) stored in

Linked service Runs on


(e.g. SQL Server, Hadoop
Cluster)

Control flow Parameters Integration runtime


Azure Data Factory components
Click to edit Master title style
Linked Service Triggers
@ Parameters

IR Integration
Runtime
Data
Pipeline Control
Lake Store CF
Flow
1
Activities

Azure
Databricks
Dataset
Azure Monitor
Click to edit Master title style
Data Pipelines
Click to edit Master title style
Azure Diagnostics
Click to edit Master title style
Log Analytics
Click to edit Master title style
Lambda architectures from a real time
Click to edit Master title style
mode perspective
Speed Layer:
The Speed layer processes data streams in
real or near real time. This works well when
the aim is to minimize the latency of the
data ingestion to analysis:
1. New data ingested from sources
4. Real time views of the data created

Serving Layer:
The serving layer is optional in the
real-time architecture and acts as the
storage output of either the Batch or Speed
layer that is used by client applications to
access the results
of the data-sets
Architect a stream processing pipeline
Click to edit Master title style
with Azure Stream Analytics
Design a stream processing pipeline
Click to edit Master title style
with Azure Databricks
Automate an enterprise business
Click to edit Master title style
intelligence architecture
Click to edit Master title style

Exam DP-203
Item Types
Multiple Choice
Click to edit Master title style
Multiple Choice
Click to edit Master title style
Repeated Scenario
Click to edit Master title style
You need to move an Azure VM to another hardware
host.

Solution: You redeploy the VM.

Does this solution meet the goal?

a. Yes
b.No
Repeated Scenario
Click to edit Master title style
You need to move an Azure VM to another hardware
host.

Solution: You create a proximity placement group.

Does this solution meet the goal?

a. Yes
b.No
Select and Place
Click to edit Master title style
Build List and Reorder
Click to edit Master title style
Active Screen
Click to edit Master title style
Case Study
Click to edit Master title style
Performance-Based Lab
Click to edit Master title style
Click to edit Master title style

Microsoft Online Testing


Microsoft Online Testing Process
Click to edit Master title style

timw.info/online

You might also like