Azure Synapse Guidebook

The
NERD'S GUIDE TO
Azure Synapse
TABLE OF CONTENTS
Azure Synapse Fundamentals 03
SQL Pools 05
Integration with Azure Data Lake 07
Perform ETL with Pipelines and Data Flows 09
Synapse Notebooks 13
Built-in Power BI Integration Capabilities 14
THE NERD'S GUIDE TO AZURE SYNAPSE | 02

AZURE
SYNAPSE
FUNDAMENTALS
AZURE SYNAPSE ANALYTICS BRING WHY AZURE SYNAPSE?

TOGETHER CORE TECHNOLOGIES TO On-premises hardware and software require
HELP BUSINESSES REACH THEIR constant investment in upgrading systems,
DATA ANALYTICAL GOALS IN A increasing capacity, and optimizing. Also, it is
SINGLE WORKSPACE time-intensive to ensure employees can manage
ENVIRONMENT. and understand everything.
This workspace experience allows Azure Synapse aims to resolve these issues by
for integration capabilities across bringing unmatched time to insight on a unified
multiple technologies, such as SQL analytical platform by combining data
Pools, Pipelines and Data Flows, warehousing, big data systems, and
Power BI, Spark Notebooks, and visualization capabilities. Data analysts,
Azure Data Lakes. This creates an engineers, scientists, and BI professionals can
ideal environment for analysts, perform essential tasks like exploring data,
engineers, and scientists to work performing ETL operations, creating physical
together seamlessly. Use this and logical data models, training ML models and
guidebook to determine how Azure presenting data in the same workspace. You can
Synapse Analytics can help also control permission levels for role-based
consolidate your data integration, groups that align with employees' job roles with
enterprise data warehousing, and big Azure Synapse roles. You only need to add and
data analytics into one workspace. remove users from appropriate security groups
to manage access.

Azure Synapse
COMPONENTS AND TOOLS
Azure Synapse is unique because it allows users with

various levels of coding knowledge to interact with data,
making it ideal for those with or without a technical
background. This environment allows users to utilize all,
but is not limited to, the following components and tools:
Cloud-native HTAP Data Pipelines

Integrated AI and BI Synapse Studio
Apache Spark Synapse SQL
Azure Link Notebooks
Azure Date Lake Enterprise Data
Azure Machine Learning Warehousing
Azure Purview Data Lakes
Power BI integration Code-free Integrations

SQL POOLS
Azure Synapse architecture is built on a scale-out model designed for
big data analytics and uses two main types of SQL pools, dedicated and
serverless, to disperse computational processing across numerous nodes.
Both pools let users run analysis against big data.
Serverless Dedicated
SQL pool SQL pool
A Serverless SQL pool only charges users Dedicated SQL pools work with various
for data processed, instead of charging sources to gather users’ data and store
for time running. This makes Serverless it in Azure Data Lake. With Dedicated
Pools a great option because the pool can pools users can; perform historical
consistently remain available, unlike analysis, use it for Machine Learning, and
Dedicated SQL Pools which may be turned off other data purposes much easier.
frequently to reduce costs. Serverless pools However, these pools can be limiting
are also a great tool to aid in building out because they do charge for the time that
horizontally, while running big data they are active. This may cause users to
analytics and complex queries against turn off their dedicated pools frequently
petabytes of data. to conserve costs.
MORE TO LEARN
Serverless pools can benefit multiple professional roles within any business, including data engineers,
scientists, analysts, and BI professionals. A serverless SQL pool allows data engineers to study data lakes,
convert, and prepare data, and streamline transformations. Data scientists can use features, such as
OPENROWSET and automatic schema inference to think through the content and structure of data within
any given lake. Analysts can use T-SQL, or other tools, to connect and analyze data in Azure Data Lake or
external tables created on the Serverless Database. A serverless SQL pool also allows BI Professionals to
create Power BI reports from data lakes.


SQL POOLS
COMPARISON
DEDICATED SERVERLESS
SQL POOL SQL POOL
Tables
Views
Schemas
Temporary
Tables
Procedures
DEDICATED SERVERLESS
SQL POOL SQL POOL
Functions
Catching Queries
Triggers
Table Variables
External Tables
Table Distribution
Table Indexes
Table Partitions
Statistics
Workload
Management
Cost Control

Integration With
Azure Data Lake
An Azure Data Lake is a general-purpose storage account

where organizations can store their data. That data can
be later analyzed through SQL On-Demand queries or
moved from the data lake into a data store like SQL Pools.
AZURE DATA LAKE STORAGE GEN2 IS A

MULTIMODAL CLOUD STORAGE SERVICE
DEDICATED TO BIG DATA ANALYTICS IN AZURE
BLOB STORAGE. PRIOR TO, AZURE DATA LAKE
STORAGE GEN2, CLOUD USERS WOULD HAVE TO
USE AN OBJECT STORE. AN OBJECT STORE WOULD
INSERT AN OBJECT INTO A FLAT HIERARCHY. IF
USERS DIDN’T USE AN OBJECT STORE, THEY
WOULD PUT DATA INTO STRUCTURED
DIRECTORIES CALLED FILE SYSTEMS. HOWEVER,
AZURE DATA LAKE STORAGE GEN2 COMBINES
QUALITIES OF THE OBJECT STORE AND FILE TIP
SYSTEM. THIS PROCESS IS USED IN AZURE
You can also use Synapse
SYNAPSE TO WAREHOUSE DATA IN A CONSISTENT Studio to preview and
MODEL. sample any data that you
have uploaded into your
data lake.

Advantages Of
Azure Data Lake
Keep petabyte-size files and

trillions of objects in one location
Multiple copies of data stored, by

Microsoft, for redundancy and
availability
Multiple data tiers for

performance and cost
management
Extract data from any source
Users can explore data and create

queries

Perform ETL with
Pipelines and Data Flows
Synapse uses a data integration service that uses a dedicated
engine to perform ETL (Extract, transform and load) operations. It
frequently uses staging tables for short-term storage space while data
is loaded into a designated target area. ETL can be beneficial to the
following areas by combining the vital functions into a sole application
or set of services:
Ingest data from various data stores with prebuilt connectors.

Transform data using a Massive Parallel Processing architecture
designed for big data.
Load data to data stores in the cloud and in on-premises
environments
THE 4 PIPELINES DATA SETS

RESOURCES
DATA FLOWS LINKED
USED SERVICES
U s u a l l y , t h e s e t h r e e E T L p h a s e s r u n s i m u l ta n e o u s l y w h i c h r e s u l t s
i n a m o r e e f f i c i e n t p r o c e s s . S i m p l y p u t , a u s e r ’ s e x i s t i n g d a ta c a n b e
t r a n s f o r m e d a n d l o a d e d w h i l e m o r e d a ta i s b e i n g e x tr a c te d .
Extract Transform Load
Data Source 1
Target
Transformation
Data Source 2 Engine

W h a t is a
D a ta Fl o w ?
DATA FLOWS EMPOWER YOU TO INGEST, TRANSFORM, AND

LOAD DATA IN VARIOUS DATA STORES INCLUDING AZURE
AND ON-PREM ENVIRONMENTS. ADDITIONALLY, DATA
FLOWS ARE IMPLEMENTED AS TASKS WITHIN PIPELINES
(CONTROL FLOWS), WHICH ARE RESPONSIBLE FOR ENSURING
THE TASKS IN A DATA PIPELINE ARE EXECUTED IN AN
ORDERLY MANNER BY USING PRECEDENCE CONSTRAINTS.
THEREFORE, RESULTING TASKS WILL ONLY TRIGGER ONCE THE
PREVIOUS TASK IN THE SEQUENCE HAS REACHED AN
OUTCOME.
Everything you build in a data flow will be converted to Scala

and run against a Databricks cluster. This provides scale-out
capability and a massive parallel processing architecture for
data preparation at scale. Since data flows bill you for the data
you process, your account will get charged every time that a
data flow is run. This is very cost-effective for majority of
users but can be limiting if the amount of data processed is
significant or if the pipeline is constantly running.

What Is A
Pipeline?
AZURE SYNAPSE PIPELINES IS A CLOUD-BASED ETL AND

DATA-INTEGRATION SERVICE THAT EMPLOYS THE
RESOURCES FROM AZURE DATA FACTORY. THIS GIVES
YOU THE ABILITY TO DIRECT DATA MIGRATION AND
TRANSFORMATION BY GENERATING DATA-DRIVEN
WORKFLOWS. AZURE SYNAPSE PIPELINES CAN ALSO
INGEST INFORMATION FROM DISTINCT DATA STORES.
IN YOUR AZURE SYNAPSE WORKSPACE YOU CAN CREATE ONE

OR MORE OF THESE DATA-DRIVEN PIPELINES. PIPELINES
PERFORMS TASKS AND ALLOWS YOU TO CONTROL ACTIVITIES
AS A SET BY GROUPING ACTIVITIES TOGETHER LOGICALLY. A
PIPELINE’S ACTIVITIES DETERMINE WHAT ACTIONS WILL BE
PERFORMED ON YOUR DATA. THIS ELIMINATES THE NEED TO
NAVIGATE BETWEEN MULTIPLE ACTIVITIES AND DATA STORES
TO PERFORM ACTIVITIES AND TASKS.
THREE MAIN GROUPS OF ACTIVITIES
Data Data Control

Movement Transform Activies

What Is A
Pipeline?
(Continued)
You can use an activity to create multiple output

datasets when it is derived from one or more input
datasets. The figure below displays how pipelines,
activities, and datasets interact with one another:
Schedule
Trigger
Pipeline
Copy Activity
Amazon S3 Azure Storage

Linked Service Linked Service
An activity input in the pipeline is represented by the

dataset for this type, then the product of an activity is
shown by the output dataset. Datasets are able to
comprehend multiple data types within various stores
and are then applied within a pipeline by the activities.

Synapse Notebooks
A Synapse notebook allows you to use live code, visualizations, and
narrative text within files that users create. They provide an environment
that is optimal for gathering quick insights from data and validating
ideas. Notebooks also provide the ability to examine data across raw and
processed formats and create data visualizations with next to no effort.
Additionally, users can build, test, train, score machine learning models
with Notebooks.
Azure Machine Learning Model Workflow

Train Package Validate Deploy Monitor
3 TIPS WHEN WORKING WITH

NOTEBOOKS
CREATE A DEVELOP A DATA

NOTEBOOK NOTEBOOK VISUALIZATION
You can generate a When developing Azure Synapse notebook

notebook using two notebooks, it is provides the option to
distinct methods. The important to remember create a customized chart,
first way is to create a that they are made up of by utilizing chart options,
new notebook. A second individual blocks of code from tabular results. This
option is to import a or text called cells. These enables you to create
preexisting notebook into cells can operate custom visuals without
your ASA Workspace. separately or as a group. writing much, if any, code.

Built-in Power BI
Integration Capabilities
I n t e g r a t i n g A z u r e S y n a p s e A n a l y ti c s w i th P o w e r B I a l l o w s u s e r s t o e a s i l y
c o n n e c t a n d a n a l y z e t h e i r d a ta . T h e i n t e g r a t i o n c a p a b i l i t y e n a b l e s th e
p o s s i b i l i t y o f c o n n e c t i n g l i v e to A z u r e D a ta L a k e o r S Q L P o o l s m ak i ng P o w er B I
r e p o r t s c a p a b l e o f r e a l t i m e a n a l y ti c s . O p ti o n a ll y , th e d a ta c an b e im p o r ted
i n t o P o w e r B I a n d r e f r e s h e d o n a s c h e d u l e p r o v id in g f a s te r p er fo r m anc e
c a p a b i l i t i e s . B u i l t - i n A r t i f i c i a l I n te l l i g e n c e a n d M a c h i n e L e a r ning C ap ab i l iti es
f u r t h e r e n h a n c e t h e d a t a e x p l o r a tio n a n d m o d e l c a p a b il iti e s o f P o w er B I .
Live connections in Power BI are accomplished by creating a
01 DirectQuery connection to your Dedicated SQL Pool.

Creating a live connection allows users to create dynamic real time
reporting against all the data stored in the SQL Pool.
One key performance enhancement feature of Dedicated Pools is
02 result-set caching. This key feature improves the performance of

repetitive queries because the query results are automatically
saved in memory.
This improves a user’s query performance and reduces compute resources.
Moreover, cached result sets reduces the impact on concurrency limits,
because queries that leverage cached result sets are not using any
concurrency slots in the SQL Pool.
Another key performance enhancement feature of Dedicated Pools is
03 Azure’s materialized view feature.

This feature can help to boost a SQL data warehouse‘s performance by pre-
computing, storing, and maintaining data in a tabular format. Materialized
views can also be used in queries without a direct reference, and therefore
changes to the application code are not needed.

pragmaticworks.com
hello@pragmaticworks.com
(904) 638-5743
7175 Hwy 17, Ste 2
Fleming Island, FL 32003
Follow us on social media

Azure Synapse Guidebook

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Azure Synapse Guidebook

Uploaded by

Copyright:

Available Formats

The

Azure Synapse Fundamentals 03

Integration with Azure Data Lake 07

Perform ETL with Pipelines and Data Flows 09

Built-in Power BI Integration Capabilities 14

THE NERD'S GUIDE TO AZURE SYNAPSE | 02

AZURE SYNAPSE ANALYTICS BRING WHY AZURE SYNAPSE?

THE NERD'S GUIDE TO AZURE SYNAPSE | 03

Azure Synapse is unique because it allows users with

Cloud-native HTAP Data Pipelines

THE NERD'S GUIDE TO AZURE SYNAPSE | 04

THE NERD'S GUIDE TO AZURE SYNAPSE | 05

THE NERD'S GUIDE TO AZURE SYNAPSE | 06

An Azure Data Lake is a general-purpose storage account

AZURE DATA LAKE STORAGE GEN2 IS A

THE NERD'S GUIDE TO AZURE SYNAPSE | 07

Keep petabyte-size files and

Multiple copies of data stored, by

Multiple data tiers for

Extract data from any source

Users can explore data and create

THE NERD'S GUIDE TO AZURE SYNAPSE | 08

Ingest data from various data stores with prebuilt connectors.

THE 4 PIPELINES DATA SETS

Extract Transform Load

THE NERD'S GUIDE TO AZURE SYNAPSE | 09

DATA FLOWS EMPOWER YOU TO INGEST, TRANSFORM, AND

Everything you build in a data flow will be converted to Scala

THE NERD'S GUIDE TO AZURE SYNAPSE | 10

AZURE SYNAPSE PIPELINES IS A CLOUD-BASED ETL AND

IN YOUR AZURE SYNAPSE WORKSPACE YOU CAN CREATE ONE

THREE MAIN GROUPS OF ACTIVITIES

Data Data Control

THE NERD'S GUIDE TO AZURE SYNAPSE | 11

You can use an activity to create multiple output

Amazon S3 Azure Storage

An activity input in the pipeline is represented by the

THE NERD'S GUIDE TO AZURE SYNAPSE | 12

Azure Machine Learning Model Workflow

3 TIPS WHEN WORKING WITH

CREATE A DEVELOP A DATA

You can generate a When developing Azure Synapse notebook

THE NERD'S GUIDE TO AZURE SYNAPSE | 13

Live connections in Power BI are accomplished by creating a

01 DirectQuery connection to your Dedicated SQL Pool.

One key performance enhancement feature of Dedicated Pools is

02 result-set caching. This key feature improves the performance of

Another key performance enhancement feature of Dedicated Pools is

03 Azure’s materialized view feature.

THE NERD'S GUIDE TO AZURE SYNAPSE | 14

7175 Hwy 17, Ste 2

Fleming Island, FL 32003

Follow us on social media

You might also like