Welcome to Scribd!

Azure Fabric

Uploaded by

cloudtraining2023

0% found this document useful (0 votes)

3 views2 pages

The document discusses various options for performing data transformations on data stored in Azure Data Lake Storage Gen 2 - Synapse dedicated SQL pool/serverless SQL pool, Apache Spark in Azure Synapse Analytics, and Integration Runtime. It provides a series of audience polls asking which option should be used in different transformation scenarios. It then provides a decision matrix summary with recommendations for which option to use for different transformation requirements, such as initial data exploration, transforming data on-premises, handling different file formats, and dealing with JSON or badly formatted data.

Original Description:

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

3 views2 pages

Azure Fabric

Uploaded by

cloudtraining2023

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Activity 02: Data Engineering Discussion

Audience Poll 1
Q: You need to perform some data preparation on data stored in ADLS Gen 2. Which
option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 2
Q: You are performing initial exploration of the data and experimenting with the
necessary transformations. Which option should you use to run the transformations
(pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 3
Q: You want to process a subset of files in folder filled with CSV files, all
having the same schema. Which option should you use to run the transformations
(pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 4
Q: You need to transform the data on-premises or within a specific VNET before
loading it. Which option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 5
Q: You want to flatten hierarchical fields in JSON to a tabular structure. Which
option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 6
Q: You are handling file formats other than delimited (CSV), JSON or Parquet. Which
option should you use to run the transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Audience Poll 7
Q: The delimited data is badly formatted. Which option should you use to run the
transformations (pick only one)?

A) Synapse dedicated SQL pool / serverless SQL pool

B) Apache Spark in Azure Synapse Analytics

C) Integration Runtime

Decision Matrix Summary

Decision Point dedicated SQL pool / serverless SQL pool Apache Spark Pool
Integration Runtime Discussion Comment
Initial exploration of the data and experimenting with the necessary transformation
X Start with T-SQL, generally
Process a folder filled with CSV files of the same schema X Use T-
SQL OPENROWSET statement
Process a subset of files in folder filled with CSV files of the same schema X
Use T-SQL OPENROWSET statement with wildcards (*) in the path
Transform the data on-premises or within a specific VNET before loading it
X Use a Self-Hosted Integration Runtime on-premises
Transform the data in a code free way X Use an Azure Integration
Runtime
Need to flatten hierarchical fields in JSON to a tabular structure X
Use Azure Synapse SQL Pools or serverless along with the T-SQL OPENJSON,
JSON_VALUE, and JSON_QUERY statements
Need to unpack or flatten deeply nested JSON X Use Spark to deal
with very complex JSON
Handling file formats other than delimited (CSV), JSON or Parquet X
Use Spark to handle the broadest set of file formats (e.g., Orc, Avro,
others)
Handling ZIP archived data files X Use Spark to unzip the files
to storage before processing
Delimited data is badly formatted X Use Spark to handle
particularly poorly formatted files
You want to leverage open source libraries to help you with the data cleansing
X Use Python or Scala opens source libraries with Spark

DP 203 Questions 2
Document36 pages
DP 203 Questions 2
Aayoshi Dutta
No ratings yet
C 100 Dev
Document10 pages
C 100 Dev
Mai Phuong
No ratings yet
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
Document5 pages
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
welhie
0% (1)
Azure Storage PDF
Document1,305 pages
Azure Storage PDF
Srinivasa Varadhan
No ratings yet
User Manual - Online Application For Single Permit V2.2
Document18 pages
User Manual - Online Application For Single Permit V2.2
Maja Jovanovic
100% (1)
Deloitte Cloud - Task 3 - Cloud Readiness Assessment - Template
Document2 pages
Deloitte Cloud - Task 3 - Cloud Readiness Assessment - Template
Ahmedbacha Abdelkader
No ratings yet
IC InstallationConfigurationGuide 30 PDF
Document406 pages
IC InstallationConfigurationGuide 30 PDF
Peaky Blinderr
No ratings yet
Spark Interview Q&A
Document31 pages
Spark Interview Q&A
Rushi Khandare
No ratings yet
DP-203 Agenda
Document8 pages
DP-203 Agenda
Asif Khan
No ratings yet
Azure Data Solutions
Document7 pages
Azure Data Solutions
Srinivas Gorantla
No ratings yet
Azure Synapse Analytics
Document7,794 pages
Azure Synapse Analytics
Prasenjit Patnaik
No ratings yet
07 Spark Dataframes
Document45 pages
07 Spark Dataframes
mhafod
100% (1)
DP 203 ExamTopics
Document47 pages
DP 203 ExamTopics
Nacho Mullen
No ratings yet
PySpark Questions
Document5 pages
PySpark Questions
Sai Krishna
No ratings yet
DP-200 Exam: Exam DP-200 Exam Title Implementing An Azure Data Solution 8.0 Product Type 120 Q&A With Explanations
Document156 pages
DP-200 Exam: Exam DP-200 Exam Title Implementing An Azure Data Solution 8.0 Product Type 120 Q&A With Explanations
Infoholics Community
No ratings yet
SDC - Cosmos DB
Document27 pages
SDC - Cosmos DB
Luis Castillo
No ratings yet
Pyspark Interview Code
Document197 pages
Pyspark Interview Code
mailme me
100% (1)
KBKrishnaTeja Interview Questions
Document2 pages
KBKrishnaTeja Interview Questions
nikhildevopspro
No ratings yet
Sqoop Interview Questions
Document6 pages
Sqoop Interview Questions
Guruprasad Vijayakumar
No ratings yet
Data Engineering Nanodegree Program Syllabus PDF
Document5 pages
Data Engineering Nanodegree Program Syllabus PDF
Ovidiu Eremia
No ratings yet
A Brief Introduction To PySpark. PySpark Is A Great Language For - by Ben Weber - Towards Data Science
Document16 pages
A Brief Introduction To PySpark. PySpark Is A Great Language For - by Ben Weber - Towards Data Science
Joshua White
No ratings yet
Datastage Interview Questions
Document18 pages
Datastage Interview Questions
Ganesh Kumar
100% (1)
1Z0-060 Exam Dumps With PDF and VCE Download (1-20) PDF
Document11 pages
1Z0-060 Exam Dumps With PDF and VCE Download (1-20) PDF
Nitya Pradeep
100% (1)
Question #1topic 1: Hide Solution Discussion
Document74 pages
Question #1topic 1: Hide Solution Discussion
Baji Tulluru
No ratings yet
Thesis Apache Spark
Document4 pages
Thesis Apache Spark
iapesmiig
100% (2)
Manoj Kumar
Document3 pages
Manoj Kumar
Mandeep Bakshi
No ratings yet
Notes
Document12 pages
Notes
Chris Harris
No ratings yet
DP-200 Dump
Document164 pages
DP-200 Dump
shashank vishwakarma
No ratings yet
ANL252 SU6 Jul2022
Document51 pages
ANL252 SU6 Jul2022
Ebad
No ratings yet
Data Mining Lab Questions
Document47 pages
Data Mining Lab Questions
Sneha Pinky
100% (1)
Spark SQL
Document24 pages
Spark SQL
Jaswanth Chowdarys
No ratings yet
1best Practices For Migrating SAP Systems To Ora... - Oracle Community
Document4 pages
1best Practices For Migrating SAP Systems To Ora... - Oracle Community
methukupally
No ratings yet
Microsoft Certified Azure Data Engineer Associate Skills Measured
Document5 pages
Microsoft Certified Azure Data Engineer Associate Skills Measured
Fais
No ratings yet
Toronto Hadoop User Group Spark
Document16 pages
Toronto Hadoop User Group Spark
Smarty Juice
No ratings yet
Azure Data Factory
Document21 pages
Azure Data Factory
cloudtraining2023
No ratings yet
SQL Server 2016 Developer's Guide
From Everand
SQL Server 2016 Developer's Guide
Miloš Radivojević
No ratings yet
Statement, Prepared Statement and Callable Statement Are Interfaces Which Providw The Way To Interact With The Databases (Mysql
Document5 pages
Statement, Prepared Statement and Callable Statement Are Interfaces Which Providw The Way To Interact With The Databases (Mysql
myj_lokesh
No ratings yet
Pyspark Modules&packages RDD
Document9 pages
Pyspark Modules&packages RDD
klogeswaran.it
No ratings yet
Interview Questions
Document4 pages
Interview Questions
Dastagiri Saheb
No ratings yet
2016 Braindump2go 1Z0-060 Exam Dumps 161q&as 1-10
Document5 pages
2016 Braindump2go 1Z0-060 Exam Dumps 161q&as 1-10
albert_1975
No ratings yet
What Is Dedicated SQL Pool (Formerly SQL DW) - Azure Synapse Analytics - Microsoft Docs
Document4 pages
What Is Dedicated SQL Pool (Formerly SQL DW) - Azure Synapse Analytics - Microsoft Docs
demetrius albuquerque
No ratings yet
Oracle: Part I: The Questions As I Remember
Document29 pages
Oracle: Part I: The Questions As I Remember
Kauam Santos
No ratings yet
Athffna
Document8 pages
Athffna
k2sh
No ratings yet
ABAP On HANA Interview Questions
Document21 pages
ABAP On HANA Interview Questions
chetanp
100% (1)
Dice Resume CV SN
Document5 pages
Dice Resume CV SN
Shivam Pandey
No ratings yet
SME Prep
Document5 pages
SME Prep
Guruprasad Vijayakumar
No ratings yet
Data Engineering Nanodegree Program Syllabus
Document16 pages
Data Engineering Nanodegree Program Syllabus
Jonatas Eleoterio
No ratings yet
Openstack Ceph Admin
Document168 pages
Openstack Ceph Admin
ayfbjj37832kh
No ratings yet
Azuredatabricks New
Document22 pages
Azuredatabricks New
Madhavi Kareddy
No ratings yet
Pipeline: Azure Data Factory Cheat Sheet by
Document14 pages
Pipeline: Azure Data Factory Cheat Sheet by
Krishnaprasad
100% (1)
Databricks 2
Document22 pages
Databricks 2
Madhavi Kareddy
No ratings yet
NetApp XCP Data Migration
Document88 pages
NetApp XCP Data Migration
sriramrane
No ratings yet
G
Document7 pages
G
Cyro Bezerra
No ratings yet
Module 3
Document6 pages
Module 3
Abigail Mendez
No ratings yet
ETL Interview Ques5
Document4 pages
ETL Interview Ques5
Bharath
No ratings yet
Azure Datalake
Document8 pages
Azure Datalake
Rahaman Abdul
No ratings yet
Cloudera Developer Training For Apache Spark
Document3 pages
Cloudera Developer Training For Apache Spark
kesh
No ratings yet
Azure Data Engineer + Databricks Content
Document7 pages
Azure Data Engineer + Databricks Content
sai b
No ratings yet
Spark Overview: Security
Document4 pages
Spark Overview: Security
gathorsfx
No ratings yet
ADF Course Deck
Document88 pages
ADF Course Deck
clouditlab9
No ratings yet
Flake 8
Document14 pages
Flake 8
ambati
No ratings yet
Polybase in SQL Server 2016: A Big Data Story
Document47 pages
Polybase in SQL Server 2016: A Big Data Story
Prashanth Kumar
No ratings yet
SAP System Directories On UNIX
Document10 pages
SAP System Directories On UNIX
Devender5194
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Azure DEVOPS
Document4 pages
Azure DEVOPS
cloudtraining2023
No ratings yet
Aws Scripts Dataload
Document2 pages
Aws Scripts Dataload
cloudtraining2023
No ratings yet
Azure Data Platform
Document5 pages
Azure Data Platform
cloudtraining2023
No ratings yet
Azure Synapse Analytics
Document2 pages
Azure Synapse Analytics
cloudtraining2023
No ratings yet
HRMS-Payroll - Insert Data Load
Document4 pages
HRMS-Payroll - Insert Data Load
cloudtraining2023
No ratings yet
DP203-Certification Preparation
Document9 pages
DP203-Certification Preparation
cloudtraining2023
No ratings yet
Intro Slides
Document21 pages
Intro Slides
cloudtraining2023
No ratings yet
Azure Synapse Analytics PoC Environment
Document8 pages
Azure Synapse Analytics PoC Environment
cloudtraining2023
No ratings yet
Airflow
Document37 pages
Airflow
cloudtraining2023
No ratings yet
Use Case Requirements Validation Checklist
Document2 pages
Use Case Requirements Validation Checklist
cloudtraining2023
No ratings yet
Business Requirements Document (BRD) Template
Document8 pages
Business Requirements Document (BRD) Template
cloudtraining2023
No ratings yet
Notes
Document1 page
Notes
cloudtraining2023
No ratings yet
Apache Hive Essentials
Document27 pages
Apache Hive Essentials
cloudtraining2023
No ratings yet
Activity Log
Document6 pages
Activity Log
ramizzz
No ratings yet
Cybersecurity Project
Document20 pages
Cybersecurity Project
api-537419578
No ratings yet
Test Midterm-1 2015 05 24 01 23 05 446
Document11 pages
Test Midterm-1 2015 05 24 01 23 05 446
Kasa Ferenc
100% (1)
List of Sample Use Cases Draft Version
Document15 pages
List of Sample Use Cases Draft Version
shadab umair
No ratings yet
Conceptual Framework
Document3 pages
Conceptual Framework
prodiejigs36
No ratings yet
Effective Software Testing
Document93 pages
Effective Software Testing
chinnadurai20006034
100% (1)
What Is An Ivr System and How Does It Work
Document3 pages
What Is An Ivr System and How Does It Work
hamidboulahia
No ratings yet
Data Analytics Learning Plan
Document3 pages
Data Analytics Learning Plan
Arpit Gaur
No ratings yet
School Based Assessment 2023-24 Second Term Computer Education Grade 8
Document1 page
School Based Assessment 2023-24 Second Term Computer Education Grade 8
Hamza Ch
No ratings yet
Dakshin Haryana Bijli Vitran Nigam (DHBVN) - Bill Summary
Document1 page
Dakshin Haryana Bijli Vitran Nigam (DHBVN) - Bill Summary
ankusshhan
No ratings yet
Online Notice Board: Project Report For Bachelor of Computer Application
Document30 pages
Online Notice Board: Project Report For Bachelor of Computer Application
AnUp KaSuLa
No ratings yet
Prajwal 20 Ghadigoankar 20 Data 20 Analyst 20&20 ML20 Resume 1
Document2 pages
Prajwal 20 Ghadigoankar 20 Data 20 Analyst 20&20 ML20 Resume 1
HBK
No ratings yet
16 S
Document8 pages
16 S
Pratyush
No ratings yet
Electronics: Impact and Key Challenges of Insider Threats On Organizations and Critical Businesses
Document29 pages
Electronics: Impact and Key Challenges of Insider Threats On Organizations and Critical Businesses
Jeffrey walker
No ratings yet
Web Caching Through Modified Cache Replacement Algorithm
Document5 pages
Web Caching Through Modified Cache Replacement Algorithm
Tenma Inazuma
No ratings yet
Full Ipa Server Client Setup
Document13 pages
Full Ipa Server Client Setup
Sri Waste
No ratings yet
CCIE DC 21 Learning Matrix
Document22 pages
CCIE DC 21 Learning Matrix
Leidy Cruz
No ratings yet
Electrical Engineering Objective Questions by JB Gupta
Document2 pages
Electrical Engineering Objective Questions by JB Gupta
Syed Shehryar Ali
100% (1)
Building Web Application Using Cloud Computing: October 2016
Document6 pages
Building Web Application Using Cloud Computing: October 2016
UdupiSri group
No ratings yet
iSOFT PACS v2.0
Document4 pages
iSOFT PACS v2.0
desbest
No ratings yet
Presentation - Top Oracle Database 11g High Availability Best Practices
Document67 pages
Presentation - Top Oracle Database 11g High Availability Best Practices
Truong Hoang
No ratings yet
Project Report On Hostel Management Syst
Document77 pages
Project Report On Hostel Management Syst
Haritv Krishnagiri
No ratings yet
Examen Kaspersky Certified
Document38 pages
Examen Kaspersky Certified
Mecachis Piachis
100% (2)
Configuring Avaya SIP Enablement Services and Avaya Communication Manager To Load Balance
Document25 pages
Configuring Avaya SIP Enablement Services and Avaya Communication Manager To Load Balance
Roberto Montagnana
No ratings yet
HPE MSA Family Installation and Startup Service Data Sheet-4aa0-3048enw
Document6 pages
HPE MSA Family Installation and Startup Service Data Sheet-4aa0-3048enw
Hernan Herrera
No ratings yet