You are on page 1of 5

Course data sheet

Data Engineering on Microsoft


Azure
(DP-203T00)
H9P83S

In this course, students learn about data engineering


HPE course number H9P83S
patterns and practices as they pertain to batch and real-time
Course length 4 days
analytical solutions using Azure data platform technologies.
Delivery mode VILT
Students begin by understanding the core compute and
View schedule, local
pricing, and register
View now storage technologies used to build an analytical solution.
View related courses View now They then explore how to design an analytical serving layer
and focus on data engineering considerations for working
with source files. Students also learn how to interactively
Why HPE Education Services?
explore data stored in files in a data lake. They learn the
• Comprehensive worldwide HPE technical, various ingestion techniques used to load data using the
IT industry and personal development
training Apache Spark capability found in Azure Synapse Analytics
• Training and certification preparation
for ITIL®, Security, VMware®, Linux,
or Azure Databricks, and how to ingest using Azure Data
Microsoft and more Factory or Azure Synapse pipelines. Students also learn the
• Innovative training options that match
individual learning styles
various ways to transform data using the same technologies
• Anytime, anywhere remote learning via used to ingest data. Students spend time learning how to
HPE Digital Learner subscriptions
• Verifiable digital badges for proof of
monitor and analyze the performance of analytical systems
training, skill recognition and career to optimize the performance of data loads, and queries that
development
• Simplified purchase options with
are issued against the systems. They learn the importance
HPE Training Credits of implementing security to ensure that data is protected at
rest or in transit. Student then show how data in an analytical
system can be used to create dashboards or build predictive
models in Azure Synapse Analytics.

*Realize Technology Value with Training, IDC


Infographic 2037, Sponsored by Hewlett Packard
Enterprise, 2019
Course data sheet Page 2

Audience Course objectives


The primary audience for this course is data Upon completion of this course, students will
professionals, data architects, and business know how to:
intelligence professionals who want to learn • Explore compute and storage options for
about data engineering and how to build data engineering workloads in Azure
analytical solutions using data platform
technologies that exist on Microsoft Azure. • Design and implement the serving layer
The secondary audience for this course is data
• Understand data engineering considerations
analysts and data scientists who work with
analytical solutions built on Microsoft Azure. • Run interactive queries using serverless
SQL pools
Job Role
• Explore, transform, and load data into the
Data engineer data warehouse using Apache Spark
Prerequisites • Perform data exploration and
transformation in Azure Databricks
Successful students start this course with
• Ingest and load data into the data
knowledge of cloud computing and core data
warehouse
concepts and have professional experience
with data solutions. • Transform data with Azure Data Factory or
Azure Synapse Pipelines
Specifically, students should have completed:
• Integrate data from notebooks with Azure
• HM9H6S: Azure Fundamentals Data Factory or Azure Synapse Pipelines
(AZ-900T01) • Optimize query performance with dedicated
• HQ6Z2S: Microsoft Azure Data SQL pools in Azure Synapse
Fundamentals (DP-900T00) • Analyze and optimize data warehouse
storage
• Support Hybrid Transactional Analytical
Processing (HTAP) with Azure Synapse
Link
• Perform end-to-end security with Azure
Synapse Analytics
• Perform real-time stream processing with
Stream Analytics
• Create a stream processing solution with
Event Hubs and Azure Databricks
• Build reports using Power BI integration
with Azure Synpase Analytics
• Perform integrated machine learning
processes in Azure Synapse Analytics
Course data sheet Page 3

Detailed course outline


Module 1: Explore Compute and Storage Options • Introduction to Azure Synapse Analytics • Lab: Explore compute and storage options for data
for Data Engineering Workloads engineering workloads
• Describe Azure Databricks
– Combine streaming and batch processing with a single
• Introduction to Azure Data Lake storage pipeline
– Organize the data lake into levels of file transformation
• Describe Delta Lake architecture
– Index data lake storage for query and workload
• Work with data streams by using Azure Stream Analytics acceleration

Module 2: Design and Implement the Serving • Design a multidimensional schema to optimize analytical • Lab: Designing and implementing the serving layer
Layer workloads
– Design a star schema for analytical workloads
• Code-free transformation at scale with Azure Data Factory – Populate slowly changing dimensions with Azure Data
Factory and map data flows
• Populate slowly changing dimensions in Azure Synapse
Analytics pipelines

Module 3: Data Engineering Considerations for • Design a modern data warehouse using Azure Synapse • Lab: Data engineering considerations
Source File Analytics
– Managing files in an Azure data lake
• Secure a data warehouse in Azure Synapse Analytics
– Securing files stored in an Azure data lake

Module 4: Run Interactive Queries Using Azure • Explore Azure Synapse serverless SQL pools capabilities • Lab: Run interactive queries using serverless SQL pools
Synapse Analytics Serverless SQL Pools
• Query data in the lake using Azure Synapse serverless SQL – Query Parquet data with serverless SQL pools
pools – Create external tables for Parquet and CSV files
• Create metadata objects in Azure Synapse serverless SQL – Create views with serverless SQL pools
pools
– Secure access to data in a data lake when using
• Secure data and manage users in Azure Synapse serverless serverless SQL pools
SQL pools – Configure data lake security using Role-Based Access
Control (RBAC) and Access Control List

Module 5: Explore, Transform, and Load Data into • Understand big data engineering with Apache Spark in • Lab: Explore, transform, and load data into the data
the Data Warehouse Using Apache Spark Azure Synapse Analytics warehouse using Apache Spark
• Ingest data with Apache Spark notebooks in Azure Synapse – Perform data exploration in Synapse Studio
Analytics – Ingest data with Spark notebooks in Azure Synapse
Analytics
• Transform data with DataFrames in Apache Spark Pools in
Azure Synapse Analytics – Transform data with DataFrames in Spark pools in
Azure Synapse Analytics
• Integrate SQL and Apache Spark pools in Azure Synapse – Integrate SQL and Spark pools in Azure Synapse
Analytics Analytics

Module 6: Data Exploration and Transformation in • Describe Azure Databricks • Lab: Data exploration and transformation in Azure
Azure Databricks Databricks
• Read and write data in Azure Databricks
– Use DataFrames in Azure Databricks to explore and
• Work with DataFrames in Azure Databricks filter data
– Cache a DataFrame for faster subsequent queries
• Work with DataFrames advanced methods in Azure
Databricks – Remove duplicate data
– Manipulate date/time values
– Remove and rename DataFrame columns
– Aggregate data stored in a DataFrame
Course data sheet Page 4

Module 7: Ingest and Load Data into the Data • Use data loading best practices in Azure Synapse Analytics • Lab: Ingest and load data into the data warehouse
Warehouse
• Petabyte-scale ingestion with Azure Data Factor – Perform petabyte-scale ingestion with Azure Synapse
Pipelines
– Import data with PolyBase and COPY using T-SQL
– Use data loading best practices in Azure Synapse
Analytics

Module 8: Transform Data with Azure Data • Data integration with Azure Data Factory or Azure Synapse • Lab: Transform Data with Azure Data Factory or Azure
Factory or Azure Synapse Pipelines Pipelines Synapse Pipelines
• Code-free transformation at scale with Azure Data Factory or – Execute code-free transformations at scale with Azure
Synapse Pipelines
Azure Synapse Pipelines
– Create data pipeline to import poorly formatted CSV
files
– Create mapping data flows

Module 9: Orchestrate Data Movement and • Orchestrate data movement and transformation in Azure • Lab: Orchestrate data movement and transformation in
Transformation in Azure Synapse Pipelines Data Factory Azure Synapse Pipelines
– Integrate data from notebooks with Azure Data Factory
or Azure Synapse Pipelines

Module 10: Optimize Query Performance with • Optimize data warehouse query performance in Azure • Lab: Optimize query performance with dedicated SQL
Dedicated SQL Pools in Azure Synapse Synapse Analytics pools in Azure Synapse
• Understand data warehouse developer features of Azure – Understand developer features of Azure Synapse
Analytics
Synapse Analytics
– Optimize data warehouse query performance in Azure
Synapse Analytics
– Improve query performance

Module 11: Analyze and Optimize Data • Analyze and optimize data warehouse storage in Azure • Lab: Analyze and optimize data warehouse storage
Warehouse Storage Synapse Analytics
– Check for skewed data and space usage
– Understand column store storage details
– Study the impact of materialized views
– Explore rules for minimally logged operations

Module 12: Support Hybrid Transactional • Design hybrid transactional and analytical processing using • Lab: Support Hybrid Transactional Analytical Processing
Analytical Processing (HTAP) with Azure Synapse Azure Synapse Analytics (HTAP) with Azure Synapse Link
Link
• Configure Azure Synapse Link with Azure Cosmos DB – Configure Azure Synapse Link with Azure Cosmos DB
– Query Azure Cosmos DB with Apache Spark for
• Query Azure Cosmos DB with Apache Spark pools Synapse Analytics
• Query Azure Cosmos DB with serverless SQL pools – Query Azure Cosmos DB with serverless SQL pool for
Azure Synapse Analytics

Module 13: End-to-End Security with Azure • Secure a data warehouse in Azure Synapse Analytics • Lab: End-to-end security with Azure Synapse Analytics
Synapse Analytics
• Configure and manage secrets in Azure Key Vault – Secure Azure Synapse Analytics supporting
infrastructure
• Implement compliance controls for sensitive data – Secure the Azure Synapse Analytics workspace and
managed services
– Secure Azure Synapse Analytics workspace data
Course data sheet Page 5

Module 14: Real-Time Stream Processing with • Enable reliable messaging for big data applications using • Lab: Real-time stream processing with Stream Analytics
Stream Analytics Azure Event Hubs
– Use Stream Analytics to process real-time data from
Event Hubs
• Work with data streams by using Azure Stream Analytics
– Use Stream Analytics windowing functions to build
• Ingest data streams with Azure Stream Analytics aggregates and output to Synapse Analytics
– Scale the Azure Stream Analytics job to increase
throughput through partitioning
– Repartition the stream input to optimize parallelization

Module 15: Create a Stream Processing Solution • Process streaming data with Azure Databricks structured • Lab: Create a stream processing solution with Event Hubs
with Event Hubs and Azure Databricks streaming and Azure Databricks
– Explore key features and uses of structured streaming
– Stream data from a file and write it out to a distributed
file system
– Use sliding windows to aggregate over chunks of data
rather than all data
– Apply watermarking to remove stale data
– Connect to Event Hubs read and write streams

Module 16: Build Reports Using Power BI • Create reports with Power BI using its integration with Azure • Lab: Build reports using Power BI integration with Azure
Integration with Azure Synapse Analytics Synapse Analytics Synpase Analytics
– Integrate an Azure Synapse workspace and Power BI
– Optimize integration with Power BI
– Improve query performance with materialized views
and result-set caching
– Visualize data with SQL serverless and create a Power
BI report

Module 17: Perform Integrated Machine Learning • Use the integrated machine learning process in Azure • Lab: Perform integrated machine learning processes in
Processes in Azure Synapse Analytics Synapse Analytics Azure Synapse Analytics
– Create an Azure Machine Learning linked service
– Trigger an auto ML experiment using data from a
Spark table
– Enrich data using trained models
– Serve prediction results using Power BI

Learn more at
hpe.com/ww/learnmicrosoft

Follow us:

© Copyright 2021 Hewlett Packard Enterprise Development LP. The information contained herein is subject to change without
notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements
accompanying such products and services. Nothing herein should be construed as constituting an additional warranty.
Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein.
Microsoft is either a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries. All other
third-party marks are property of their respective owners.

H9P83S A.00, May 2021

You might also like