You are on page 1of 15

The

NERD'S GUIDE TO

Azure Synapse
TABLE OF CONTENTS

Azure Synapse Fundamentals 03

SQL Pools 05

Integration with Azure Data Lake 07

Perform ETL with Pipelines and Data Flows 09

Synapse Notebooks 13

Built-in Power BI Integration Capabilities 14

THE NERD'S GUIDE TO AZURE SYNAPSE | 02


AZURE
SYNAPSE
FUNDAMENTALS

AZURE SYNAPSE ANALYTICS BRING WHY AZURE SYNAPSE?


TOGETHER CORE TECHNOLOGIES TO On-premises hardware and software require
HELP BUSINESSES REACH THEIR constant investment in upgrading systems,
DATA ANALYTICAL GOALS IN A increasing capacity, and optimizing. Also, it is
SINGLE WORKSPACE time-intensive to ensure employees can manage
ENVIRONMENT. and understand everything.

This workspace experience allows Azure Synapse aims to resolve these issues by
for integration capabilities across bringing unmatched time to insight on a unified
multiple technologies, such as SQL analytical platform by combining data
Pools, Pipelines and Data Flows, warehousing, big data systems, and
Power BI, Spark Notebooks, and visualization capabilities. Data analysts,
Azure Data Lakes. This creates an engineers, scientists, and BI professionals can
ideal environment for analysts, perform essential tasks like exploring data,
engineers, and scientists to work performing ETL operations, creating physical
together seamlessly. Use this and logical data models, training ML models and
guidebook to determine how Azure presenting data in the same workspace. You can
Synapse Analytics can help also control permission levels for role-based
consolidate your data integration, groups that align with employees' job roles with
enterprise data warehousing, and big Azure Synapse roles. You only need to add and
data analytics into one workspace. remove users from appropriate security groups
to manage access.

THE NERD'S GUIDE TO AZURE SYNAPSE | 03


Azure Synapse
COMPONENTS AND TOOLS

Azure Synapse is unique because it allows users with


various levels of coding knowledge to interact with data,
making it ideal for those with or without a technical
background. This environment allows users to utilize all,
but is not limited to, the following components and tools:

Cloud-native HTAP Data Pipelines


Integrated AI and BI Synapse Studio
Apache Spark Synapse SQL
Azure Link Notebooks
Azure Date Lake Enterprise Data
Azure Machine Learning Warehousing
Azure Purview Data Lakes
Power BI integration Code-free Integrations

THE NERD'S GUIDE TO AZURE SYNAPSE | 04


SQL POOLS
Azure Synapse architecture is built on a scale-out model designed for
big data analytics and uses two main types of SQL pools, dedicated and
serverless, to disperse computational processing across numerous nodes.
Both pools let users run analysis against big data.

Serverless Dedicated
SQL pool SQL pool

A Serverless SQL pool only charges users Dedicated SQL pools work with various
for data processed, instead of charging sources to gather users’ data and store
for time running. This makes Serverless it in Azure Data Lake. With Dedicated
Pools a great option because the pool can pools users can; perform historical
consistently remain available, unlike analysis, use it for Machine Learning, and
Dedicated SQL Pools which may be turned off other data purposes much easier.
frequently to reduce costs. Serverless pools However, these pools can be limiting
are also a great tool to aid in building out because they do charge for the time that
horizontally, while running big data they are active. This may cause users to
analytics and complex queries against turn off their dedicated pools frequently
petabytes of data. to conserve costs.

MORE TO LEARN

Serverless pools can benefit multiple professional roles within any business, including data engineers,
scientists, analysts, and BI professionals. A serverless SQL pool allows data engineers to study data lakes,
convert, and prepare data, and streamline transformations. Data scientists can use features, such as
OPENROWSET and automatic schema inference to think through the content and structure of data within
any given lake. Analysts can use T-SQL, or other tools, to connect and analyze data in Azure Data Lake or
external tables created on the Serverless Database. A serverless SQL pool also allows BI Professionals to
create Power BI reports from data lakes.

THE NERD'S GUIDE TO AZURE SYNAPSE | 05


SQL POOLS
COMPARISON

DEDICATED SERVERLESS
SQL POOL SQL POOL

Tables

Views

Schemas

Temporary
Tables

Procedures
DEDICATED SERVERLESS
SQL POOL SQL POOL
Functions
Catching Queries

Triggers
Table Variables

External Tables
Table Distribution

Table Indexes

Table Partitions

Statistics

Workload
Management

Cost Control

THE NERD'S GUIDE TO AZURE SYNAPSE | 06


Integration With
Azure Data Lake

An Azure Data Lake is a general-purpose storage account


where organizations can store their data. That data can
be later analyzed through SQL On-Demand queries or
moved from the data lake into a data store like SQL Pools.

AZURE DATA LAKE STORAGE GEN2 IS A


MULTIMODAL CLOUD STORAGE SERVICE
DEDICATED TO BIG DATA ANALYTICS IN AZURE
BLOB STORAGE. PRIOR TO, AZURE DATA LAKE
STORAGE GEN2, CLOUD USERS WOULD HAVE TO
USE AN OBJECT STORE. AN OBJECT STORE WOULD
INSERT AN OBJECT INTO A FLAT HIERARCHY. IF
USERS DIDN’T USE AN OBJECT STORE, THEY
WOULD PUT DATA INTO STRUCTURED
DIRECTORIES CALLED FILE SYSTEMS. HOWEVER,
AZURE DATA LAKE STORAGE GEN2 COMBINES
QUALITIES OF THE OBJECT STORE AND FILE TIP
SYSTEM. THIS PROCESS IS USED IN AZURE
You can also use Synapse
SYNAPSE TO WAREHOUSE DATA IN A CONSISTENT Studio to preview and
MODEL. sample any data that you
have uploaded into your
data lake.

THE NERD'S GUIDE TO AZURE SYNAPSE | 07


Advantages Of
Azure Data Lake

Keep petabyte-size files and


trillions of objects in one location

Multiple copies of data stored, by


Microsoft, for redundancy and
availability

Multiple data tiers for


performance and cost
management

Extract data from any source

Users can explore data and create


queries

THE NERD'S GUIDE TO AZURE SYNAPSE | 08


Perform ETL with
Pipelines and Data Flows
Synapse uses a data integration service that uses a dedicated
engine to perform ETL (Extract, transform and load) operations. It
frequently uses staging tables for short-term storage space while data
is loaded into a designated target area. ETL can be beneficial to the
following areas by combining the vital functions into a sole application
or set of services:

Ingest data from various data stores with prebuilt connectors.


Transform data using a Massive Parallel Processing architecture
designed for big data.
Load data to data stores in the cloud and in on-premises
environments

THE 4 PIPELINES DATA SETS


RESOURCES
DATA FLOWS LINKED
USED SERVICES

U s u a l l y , t h e s e t h r e e E T L p h a s e s r u n s i m u l ta n e o u s l y w h i c h r e s u l t s
i n a m o r e e f f i c i e n t p r o c e s s . S i m p l y p u t , a u s e r ’ s e x i s t i n g d a ta c a n b e
t r a n s f o r m e d a n d l o a d e d w h i l e m o r e d a ta i s b e i n g e x tr a c te d .

Extract Transform Load

Data Source 1
Target

Transformation
Data Source 2 Engine

THE NERD'S GUIDE TO AZURE SYNAPSE | 09


W h a t is a
D a ta Fl o w ?

DATA FLOWS EMPOWER YOU TO INGEST, TRANSFORM, AND


LOAD DATA IN VARIOUS DATA STORES INCLUDING AZURE
AND ON-PREM ENVIRONMENTS. ADDITIONALLY, DATA
FLOWS ARE IMPLEMENTED AS TASKS WITHIN PIPELINES
(CONTROL FLOWS), WHICH ARE RESPONSIBLE FOR ENSURING
THE TASKS IN A DATA PIPELINE ARE EXECUTED IN AN
ORDERLY MANNER BY USING PRECEDENCE CONSTRAINTS.
THEREFORE, RESULTING TASKS WILL ONLY TRIGGER ONCE THE
PREVIOUS TASK IN THE SEQUENCE HAS REACHED AN
OUTCOME.

Everything you build in a data flow will be converted to Scala


and run against a Databricks cluster. This provides scale-out
capability and a massive parallel processing architecture for
data preparation at scale. Since data flows bill you for the data
you process, your account will get charged every time that a
data flow is run. This is very cost-effective for majority of
users but can be limiting if the amount of data processed is
significant or if the pipeline is constantly running.

THE NERD'S GUIDE TO AZURE SYNAPSE | 10


What Is A
Pipeline?

AZURE SYNAPSE PIPELINES IS A CLOUD-BASED ETL AND


DATA-INTEGRATION SERVICE THAT EMPLOYS THE
RESOURCES FROM AZURE DATA FACTORY. THIS GIVES
YOU THE ABILITY TO DIRECT DATA MIGRATION AND
TRANSFORMATION BY GENERATING DATA-DRIVEN
WORKFLOWS. AZURE SYNAPSE PIPELINES CAN ALSO
INGEST INFORMATION FROM DISTINCT DATA STORES.

IN YOUR AZURE SYNAPSE WORKSPACE YOU CAN CREATE ONE


OR MORE OF THESE DATA-DRIVEN PIPELINES. PIPELINES
PERFORMS TASKS AND ALLOWS YOU TO CONTROL ACTIVITIES
AS A SET BY GROUPING ACTIVITIES TOGETHER LOGICALLY. A
PIPELINE’S ACTIVITIES DETERMINE WHAT ACTIONS WILL BE
PERFORMED ON YOUR DATA. THIS ELIMINATES THE NEED TO
NAVIGATE BETWEEN MULTIPLE ACTIVITIES AND DATA STORES
TO PERFORM ACTIVITIES AND TASKS.

THREE MAIN GROUPS OF ACTIVITIES

Data Data Control


Movement Transform Activies

THE NERD'S GUIDE TO AZURE SYNAPSE | 11


What Is A
Pipeline?
(Continued)

You can use an activity to create multiple output


datasets when it is derived from one or more input
datasets. The figure below displays how pipelines,
activities, and datasets interact with one another:

Schedule
Trigger
Pipeline

Copy Activity

Amazon S3 Azure Storage


Linked Service Linked Service

An activity input in the pipeline is represented by the


dataset for this type, then the product of an activity is
shown by the output dataset. Datasets are able to
comprehend multiple data types within various stores
and are then applied within a pipeline by the activities.

THE NERD'S GUIDE TO AZURE SYNAPSE | 12


Synapse Notebooks
A Synapse notebook allows you to use live code, visualizations, and
narrative text within files that users create. They provide an environment
that is optimal for gathering quick insights from data and validating
ideas. Notebooks also provide the ability to examine data across raw and
processed formats and create data visualizations with next to no effort.
Additionally, users can build, test, train, score machine learning models
with Notebooks.

Azure Machine Learning Model Workflow


Train Package Validate Deploy Monitor

3 TIPS WHEN WORKING WITH


NOTEBOOKS

CREATE A DEVELOP A DATA


NOTEBOOK NOTEBOOK VISUALIZATION

You can generate a When developing Azure Synapse notebook


notebook using two notebooks, it is provides the option to
distinct methods. The important to remember create a customized chart,
first way is to create a that they are made up of by utilizing chart options,
new notebook. A second individual blocks of code from tabular results. This
option is to import a or text called cells. These enables you to create
preexisting notebook into cells can operate custom visuals without
your ASA Workspace. separately or as a group. writing much, if any, code.

THE NERD'S GUIDE TO AZURE SYNAPSE | 13


Built-in Power BI
Integration Capabilities

I n t e g r a t i n g A z u r e S y n a p s e A n a l y ti c s w i th P o w e r B I a l l o w s u s e r s t o e a s i l y
c o n n e c t a n d a n a l y z e t h e i r d a ta . T h e i n t e g r a t i o n c a p a b i l i t y e n a b l e s th e
p o s s i b i l i t y o f c o n n e c t i n g l i v e to A z u r e D a ta L a k e o r S Q L P o o l s m ak i ng P o w er B I
r e p o r t s c a p a b l e o f r e a l t i m e a n a l y ti c s . O p ti o n a ll y , th e d a ta c an b e im p o r ted
i n t o P o w e r B I a n d r e f r e s h e d o n a s c h e d u l e p r o v id in g f a s te r p er fo r m anc e
c a p a b i l i t i e s . B u i l t - i n A r t i f i c i a l I n te l l i g e n c e a n d M a c h i n e L e a r ning C ap ab i l iti es
f u r t h e r e n h a n c e t h e d a t a e x p l o r a tio n a n d m o d e l c a p a b il iti e s o f P o w er B I .

Live connections in Power BI are accomplished by creating a

01 DirectQuery connection to your Dedicated SQL Pool.


Creating a live connection allows users to create dynamic real time
reporting against all the data stored in the SQL Pool.

One key performance enhancement feature of Dedicated Pools is

02 result-set caching. This key feature improves the performance of


repetitive queries because the query results are automatically
saved in memory.
This improves a user’s query performance and reduces compute resources.
Moreover, cached result sets reduces the impact on concurrency limits,
because queries that leverage cached result sets are not using any
concurrency slots in the SQL Pool.

Another key performance enhancement feature of Dedicated Pools is

03 Azure’s materialized view feature.


This feature can help to boost a SQL data warehouse‘s performance by pre-
computing, storing, and maintaining data in a tabular format. Materialized
views can also be used in queries without a direct reference, and therefore
changes to the application code are not needed.

THE NERD'S GUIDE TO AZURE SYNAPSE | 14


pragmaticworks.com

hello@pragmaticworks.com

(904) 638-5743

7175 Hwy 17, Ste 2

Fleming Island, FL 32003

Follow us on social media

You might also like