You are on page 1of 212

DP-203: Exam Q&A Series – Part 2

1
You use Azure Stream Analytics to receive data from Azure Event Hubs and to output the data to
an Azure Blob Storage account. You need to output the count of records received from the last five
minutes every minute. Which windowing function should you use?
a) Session
b) Tumbling
c) Sliding
d) Hopping
DP-203: Exam Q&A Series – Part 2
2
You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Users will
query data by using a variety of services including Azure Databricks and Azure Synapse Analytics
serverless SQL pools. The data will be secured by subject area. Most queries will include data from
the current year or current month. Which folder structure should you recommend to support fast
queries and simplified folder security?
a) /{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv
b) /{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
c) /{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
d) /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv
DP-203: Exam Q&A Series – Part 2
3
You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The
solution must meet the customer sentiment analytic requirements. Which three Transact-SQL DDL
commands should you run in sequence?
To answer, move the appropriate commands from the list of commands to the answer area and
arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the
correct orders you select.
Commands Answer Area
CREATE EXTERNAL DATA SOURCE CREATE EXTERNAL DATA SOURCE
CREATE EXTERNAL FILE FORMAT CREATE EXTERNAL FILE FORMAT
CREATE EXTERNAL TABLE CREATE EXTERNAL TABLE AS SELECT
CREATE EXTERNAL TABLE AS SELECT
CREATE EXTERNAL SCOPED CREDENTIALS
DP-203: Exam Q&A Series – Part 2
4
You have created an external table named ExtTable in Azure Data Explorer. Now, a database user
needs to run a KQL (Kusto Query Language) query on this external table. Which of the following
function should he use to refer to this table?
a) external_table()
b) access_table()
c) ext_table()
d) None of the above
DP-203: Exam Q&A Series – Part 2
5
You are working as a data engineer in a company. Your company wants you to ingest data onto
cloud data platforms in Azure. Which data processing framework will you use?
a) Online transaction processing (OLTP)
b) Extract, transform, and load (ETL)
c) Extract, load, and transform (ELT)

ELT is a typical process for ingesting data from an on-premises database into the Azure cloud.
DP-203: Exam Q&A Series – Part 2
6
You have an Azure Synapse workspace named
MyWorkspace that contains an Apache Spark CREATE TABLE mytestdb.myParquetTable(
EmployeeID int,
database named mytestdb. You run the
EmployeeName string,
following command in an Azure Synapse EmployeeStartDate date)
Analytics Spark pool in MyWorkspace.
USING Parquet - You then use Spark to insert a
EmployeeName EmployeeID EmplyeeStartDate
row into mytestdb.myParquetTable. The row
Peter 1001 28-July-2022
contains the following data.
One minute later, you execute the following SELECT EmployeeID -
query from a serverless SQL pool in FROM mytestdb.dbo.myParquetTable
MyWorkspace. What will be returned by the WHERE name = ‘Peter’;
query?
a) 24 b) en error c) a null value
DP-203: Exam Q&A Series – Part 2
7
In Structured data you define data type at query time.
True False

8
In Un-Structured data you define data type at query time.

True False

The schema of unstructured data is typically defined at query time. This means
that data can be loaded onto a data platform in its native format.

EmployeeName EmployeeID EmplyeeStartDate


Peter 1001 28-July-2022
DP-203: Exam Q&A Series – Part 2
9
When you create a temporal table in Azure SQL Database, it automatically creates a history table in
the same database for capturing the historical records. Which of the following statements are true
about the temporal table and history table? [Select all options that are applicable]
a) A temporal Tablo must have 1 primary key.
b) To create a temporal table, System Versioning needs to be set to On.
c) To create a temporal table, System Versioning needs to be set to Off.
d) It is mandatory to mention the name of the history table when you create the temporal table.
e) If you don't specify the name for the history table, the default naming convention is used for the
history table.
f) You can specify the table constraints for the history table.
DP-203: Exam Q&A Series – Part 2
10
To create Data Factory instances, the user account that you use to sign into Azure must be a
member of: [Select all options that are applicable]
a) contributor
b) owner role
c) administrator of the Azure subscription
d) write
DP-203: Exam Q&A Series – Part 3
11
You need to output files from Azure Data Factory. Which file format should you use for each type
of output? To answer, select the appropriate options in the answer area. NOTE: Each correct
selection is worth one point.
Columnar format JSON with a timestamp
Avro Avro
GZIP GZIP
Parquet Parquet
TXT TXT

Parquet stores data in columns. By their very nature, Avro stores data in a row-based format. Row-based
column-oriented data stores are optimized for read- databases are best for write-heavy transactional
heavy analytical workloads. workloads. An Avro schema is created using JSON
format. Avro format supports timestamps.

Azure Data Factory supports the following file formats:


Binary, Delimited text, Excel, JSON, ORC, AVRO, Parquet and XML.
DP-203: Exam Q&A Series – Part 3
12
Working as a data engineer for a car sales company you need to design an application that would
accept market information as an input. Using the machine learning classification model,
application will classify the input data into two categories:
a) Car models that sell more with buyers between 18-40 years and
b) Car models that sell more with people above 40
What would you recommend to train the model?
a) Power BI Models
b) Text Analytics API
c) Computer Vision API
d) Apache Spark MLlib
DP-203: Exam Q&A Series – Part 3
13
Note: This question is part of a series of questions that present the same scenario. Each question
in the series contains a unique solution that might meet the stated goals. Some question sets
might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
-----------------------------------------------------------------------

You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to
count the tweets in each 10-second window. The solution must ensure that each tweet is counted
only once.
Solution: You use a session window that uses a timeout size of 10 seconds.
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 3
14
You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to
count the tweets in each 10-second window. The solution must ensure that each tweet is counted
only once.
Solution: You use a sliding window, and you set the window size to 10 seconds. Does this meet the
goal?
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 3
15
You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to
count the tweets in each 10-second window. The solution must ensure that each tweet is counted
only once.
Solution: You use a tumbling window, and you set the window size to 10 seconds. Does this meet
the goal?
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 3
16
What are the key components of Azure Data Factory. [Select all options that are applicable]
a) Database
b) Connection String
c) Pipelines
d) Activities
e) Datasets
f) Linked services
g) Data Flows
h) Integration Runtimes
DP-203: Exam Q&A Series – Part 3
17
Which of the following are valid trigger types of Azure Data Factory. [Select all options that are
applicable]
a) Monthly Trigger
b) Schedule Trigger
c) Overlap Trigger
d) Tumbling Window Trigger
e) Event-based Trigger
DP-203: Exam Q&A Series – Part 3
18
You are designing an Azure Stream Analytics solution that receives instant messaging data from
an Azure Event Hub. You need to ensure that the output from the Stream Analytics job counts the
number of messages per time zone every 15 seconds. How should you complete the Stream
Analytics query? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Select TimeZone, count(*) as MessageCount
From MessageStream Last CreatedAt
Over
SYSTEM.TIMESTAMP()
TIMESTAMP BY

GROUP BY TimeZone, HOPPINGWINDOW (Second,15)


SESSIONWINDOW
SLIDINGWINDOW
TUMBLINGWINDOW
DP-203: Exam Q&A Series – Part 3
19
Duplicating customer content for redundancy and meeting service-level agreements (SLAs) is
Azure Maintainability.

Yes No

20
Duplicating customer content for redundancy and meeting service-level agreements (SLAs) is
Azure High availability.

Yes No
DP-203: Exam Q&A Series – Part 3
21
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts.
Contacts contains a column named Phone. You need to ensure that users in a specific role only
see the last four digits of a phone number when querying the Phone column. What should you
include in the solution?
a) column encryption
b) dynamic data masking
c) a default value
d) table partitions
e) row level security (RLS)
DP-203: Exam Q&A Series – Part 3
22
A company has data lake which is accessible only via an Azure virtual network. You are building a
SQL pool in Azure Synapse which will use data from the data lake and is planned to
load data to the SQL pool every hour. You need to make sure that the SQL pool can load the data
from the data lake. Which TWO actions should you perform?
a) Create a service principal
b) Create a managed identity
c) Add an Azure Active Directory Federation Service ( ADFS ) account
d) Configure managed identity as credentials for the data loading process
DP-203: Exam Q&A Series – Part 3
23
You have an Azure Data Lake Storage Gen2 container. Data is ingested into the container, and then
transformed by a data integration application. The data is NOT modified after that. Users can read files in
the container but cannot modify the files. You need to design a data archiving solution that meets the
following requirements:
• New data is accessed frequently and must be available as quickly as possible.
• Data that is older than five years is accessed infrequently but must be available within one second when
requested.
• Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the
lowest cost possible.
• Costs must be minimized while maintaining the required availability.
How should you manage the data? To answer, select the appropriate options in the answer area.
Five-Year-old data Seven-Year-old data
Delete the Blob Delete the Blob
Move to Hot storage Move to Hot storage
Move to Cool storage Move to Cool storage
Move to Archive storage Move to Archive storage
DP-203: Exam Q&A Series – Part 3
24
As a data engineer you need to suggest Stream Analytics data output format to make sure that the queries
from Databricks and PolyBase against the files encounter with less errors. The solution should make sure
that the files can be queried fast, and the data type information is kept intact. What should you suggest?
a) JSON
b) XML
c) Avro
d) Parquet
DP-203: Exam Q&A Series – Part 3
25
Which role works with Azure Cognitive Services, Cognitive Search, and the Bot Framework?
a) A data engineer
b) A data scientist
c) An AI engineer
DP-203: Exam Q&A Series – Part 4
26
Which role is correct for a person who works being responsible for the provisioning and configuration of
both on-premises and cloud data platform technologies?
a) A data engineer
b) A data scientist
c) An AI engineer
27
Who performs advanced analytics to help drive value from data.
a) A data engineer
b) A data scientist
c) An AI engineer
DP-203: Exam Q&A Series – Part 4
28
Choose the valid examples of Structured Data.
a) Microsoft SQL Server
b) Binary Files
c) Azure SQL Database
d) Audio Files
e) Azure SQL Data Warehouse
f) Image Files
DP-203: Exam Q&A Series – Part 4
29
Choose the valid examples of Un-Structured Data.
a) Microsoft SQL Server
b) Binary Files
c) Azure SQL Database
d) Audio Files
e) Azure SQL Data Warehouse
f) Image Files
DP-203: Exam Q&A Series – Part 4
30
Azure Databricks is?
a) data analytics platform
b) AI platform
c) Data ingestion platform
DP-203: Exam Q&A Series – Part 4
31
Azure Databricks encapsulates which Apache Storage technology?
a) Apache HDInsight
b) Apache Hadoop
c) Apache Spark

a) Azure HDInsight is a fully managed, full-spectrum, open-source analytics service for


enterprises. HDInsight is a cloud service that makes it easy, fast, and cost-effective to
process massive amounts of data.
b) Apache Hadoop is the original open-source framework for distributed processing and
analysis of big data sets on clusters.
c) Azure Databricks is an Apache Spark-based analytics platform optimized for the
Microsoft Azure.
DP-203: Exam Q&A Series – Part 4
32
Which security features does Azure Databricks not support?
a) Azure Active Directory
b) Shared Access Keys
c) Role-based access

Shared Access Keys are a security feature used within Azure storage accounts.
Azure Active Directory and Role-based access are supported security features in
Azure Databricks.
DP-203: Exam Q&A Series – Part 4
33
Which of the following Azure Databricks is used for support for R, SQL, Python, Scala, and Java?
a) MLlib
b) GraphX
c) Spark Core API

a) MLlib is the Machine Learning library consisting of common learning algorithms and
utilities, including classification, regression, clustering, collaborative filtering,
dimensionality reduction, as well as underlying optimization primitives.
b) GraphX provides graphs and graph computation for a broad scope of use cases from
cognitive analytics to data exploration.
c) Spark Core API support for R, SQL, Python, Scala, and Java in Azure Databricks.
DP-203: Exam Q&A Series – Part 4
34
Which Notebook format is used in Databricks?
a) DBC
b) .notebook
c) .spark

DBC file types are the supported Databricks notebook format. There is no .notebook or
.spark file format available
DP-203: Exam Q&A Series – Part 4
35
You configure version control for an Azure Data Factory instance as shown in the following exhibit
Use the drop-down menus to select the answer
choice that completes each statement based on
the information presented in the graphic.
NOTE: Each correct selection is worth one point.
Azure Resource Manager (ARM) templates for the
pipeline's assets are stored in
adf_publish
main
Parameterization template
A Data Factory Azure Resource Manager(ARM)
template named contososales can be found in
/contososales
/dw_batchetl/adf_publish/contososales
/main
DP-203: Exam Q&A Series – Part 4
36
You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.
Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file
contains the same data attributes and data from a subsidiary of your company.
You need to move the files to a different folder and transform the data to meet the following requirements:
• Provide the fastest possible query times.
• Automatically infer the schema from the underlying files.
How should you configure the Data Factory copy activity? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Copy behavior Sink File Type
Flatten hierarchy csv
Merge files json
Preserve hierarchy Parquet
TXT
DP-203: Exam Q&A Series – Part 4
37
You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as
shown in the following exhibit. All the dimension tables will be less than 2 GB after compression, and the
fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts
and updates. Which type of table should you use for each table? To answer, select the appropriate options
in the answer area. NOTE: Each correct selection is worth one point.
Dim_Customer Dim_Employee
Hash Distributed Hash Distributed
Round Robin Round Robin
Replicated Replicated

Dim_Time Fact_DailyBookings
Hash Distributed Hash Distributed
Round Robin Round Robin
Replicated Replicated
DP-203: Exam Q&A Series – Part 4
38
You are designing a data engineering solution for data stream processing. You need to recommend a
solution for data ingestion, in order to meet the following requirements:
• Ingest millions of events per second
• Easily scale from streaming megabytes of data to terabytes while keeping control over when and how
much to scale
• Integrate with Azure Functions
• Natively connected with Stream Analytics to build an end-to-end serverless streaming solution.
What would you recommend?
a) Azure Cosmos DB
b) Apache Spark
c) Azure Synapse Analytics
d) Azure Event Hubs
DP-203: Exam Q&A Series – Part 4
39
You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source
big data solution to collect, process, and maintain data. The analytical data store performs poorly.
You must implement a solution that meets the following requirements:
• Provide data warehousing
• Reduce ongoing management activities
• Deliver SQL query responses in less than one second
You need to create an HDInsight cluster to meet the requirements. Which type of cluster should you create?
a) Apache HBase
Apache Spark supports:
b) Apache Hadoop
• Interactive queries through spark-sql
c) Interactive Query • Data-warehousing capabilities
d) Apache Spark • Less management because these are out-of-the-box
features
DP-203: Exam Q&A Series – Part 4
40
Which data platform technology is a globally distributed, multi-model database that can perform queries in
less than a second?
a) SQL Database
b) Azure SQL database
c) Apache Hadoop
d) Cosmos DB
e) Azure SQL Synapse

Azure Cosmos DB is a globally distributed, multi-model database that can


offer sub-second query performance.
DP-203: Exam Q&A Series – Part 5
41
The open-source world offers four types of NoSQL databases. Select all options that are
applicable. NOTE: Each correct selection is worth one point.
a) SQL Database
b) Apache Hadoop c) Key-value store: Stores key-value pairs of data in a table
c) Key-value store structure.
d) Document database d) Document database: Stores documents that are tagged
with metadata to aid document searches.
e) Graph database
e) Graph database: Finds relationships between data points
f) Column database by using a structure that's composed of vertices and
g) Cosmos DB edges.
h) Azure SQL Synapse f) Column database: Stores data based on columns rather
than rows. Columns can be defined at the query's
runtime, allowing flexibility in the data that's returned
performantly.
DP-203: Exam Q&A Series – Part 5
42
Azure Databricks is the least expensive choice when you want to store data but don't need to query
it?

Yes No

43
Azure Storage is the least expensive choice when you want to store data but don't need to query it?

Yes No
DP-203: Exam Q&A Series – Part 5
44
Unstructured data is stored in nonrelational systems, commonly called unstructured or NoSQL

Yes No

Examples of unstructured data include binary, audio, and image files. Unstructured data
is stored in nonrelational systems, commonly called unstructured or NoSQL systems. In
nonrelational systems, the data structure isn't defined at design time, and data is
typically loaded in its raw format. The data structure is defined only when the data is
read.
DP-203: Exam Q&A Series – Part 5
45
You are designing an Azure Stream Analytics job to process incoming events from sensors in retail
environments. You need to process the events to produce a running average of shopper counts
during the previous 15 minutes, calculated at five-minute intervals. Which type of window should
you use?
a) snapshot
b) tumbling
c) hopping
d) sliding
DP-203: Exam Q&A Series – Part 5
46
You are implementing an Azure Data Lake Gen2 storage account. You need to ensure that data will
be accessible for both read and write operations, even if an entire data center (zonal or non-zonal)
becomes unavailable. Which kind of replication would you use for the storage account? (Choose
the solution with minimum cost)
a) Locally-redundant storage (LRS)
b) Zone-redundant storage (ZRS)
c) Geo-redundant storage (GRS)
d) Geo-zone-redundant storage (GZRS)
DP-203: Exam Q&A Series – Part 5
47
You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data. You need to
ensure that the data in the container is available for read workloads in a secondary region if an
outage occurs in the primary region. The solution must minimize costs. Which type of data
redundancy should you use?
a) geo-redundant storage (GRS)
b) read-access geo-redundant storage (RA-GRS)
c) zone-redundant storage (ZRS)
d) locally-redundant storage (LRS)
DP-203: Exam Q&A Series – Part 5
48
You plan to implement an Azure Data Lake Gen 2 storage account. You need to ensure that the
data lake will remain available if a data center fails in the primary Azure region. The solution must
minimize costs. Which type of replication should you use for the storage account?

a) geo-redundant storage (GRS)


b) geo-zone-redundant storage (GZRS)
c) zone-redundant storage (ZRS)
d) locally-redundant storage (LRS)
DP-203: Exam Q&A Series – Part 5
49
You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following
requirements:
• Can return an employee record from a given point in time.
• Maintains the latest employee information.
• Minimizes query complexity.
How should you model the employee data?
a) as a temporal table
b) as a SQL graph table
c) as a degenerate dimension table
d) as a Type 2 slowly changing dimension (SCD) table
DP-203: Exam Q&A Series – Part 5
50
You have a SQL pool in Azure Synapse that contains a table named dbo.Customers. The table
contains a column name Email. You need to prevent non administrative users from seeing the full
email addresses in the Email column. The users must see values in a format of abc@xxxx.com
instead. What should you do?
a) From Microsoft SQL Server Management Studio, set an email mask on the Email column.
b) From the Azure portal, set a mask on the Email column.
c) From Microsoft SQL Server Management studio, grant the SELECT permission to the users for
all the columns in the dbo.Customers table except Email.
d) From the Azure portal, set a sensitivity classification of Confidential for the Email column.
DP-203: Exam Q&A Series – Part 5
51
You have a SQL pool in Azure Synapse. A user reports that queries against the pool take longer
than expected to complete. You need to add monitoring to the underlying storage to help diagnose
the issue. Which two metrics should you monitor?
a) Cache hit percentage
b) Active queries
c) Snapshot Storage Size
d) DWU Limit
e) Cache used percentage
• Cache hit percentage: Cache hits is the sum of all column store segments
hits in the local SSD cache.
• Cache used percentage: Cache used is the sum of all bytes in the local SSD
cache across all nodes.
DP-203: Exam Q&A Series – Part 5
52
You have a SQL pool in Azure Synapse. You discover that some queries fail or take a long time to
complete. You need to monitor for transactions that have rolled back. Which dynamic management
view should you query?
a) sys.dm_pdw_nodes_tran_database_transactions
b) sys.dm_pdw_waits
c) sys.dm_pdw_request_steps
d) sys.dm_pdw_exec_sessions
DP-203: Exam Q&A Series – Part 5
53
You are designing an Azure Databricks table. The table will ingest an average of 20 million
streaming events per day. You need to persist the events in the table for use in incremental load
pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load
times. What should you include in the solution?
a) Partition by DateTime fields.
b) Sink to Azure Queue storage. The Databricks ABS-AQS connector is
c) Include a watermark column. deprecated. Databricks recommends
using Auto Loader instead.
d) Use a JSON format for physical data storage.

The ABS-AQS connector provides an optimized file source that uses Azure Queue
Storage (AQS) to find new files written to an Azure Blob storage (ABS) container without
repeatedly listing all of the files. This provides two advantages:
a) Lower latency: no need to list nested directory structures on ABS, which is slow and
resource intensive.
b) Lower costs: no more costly LIST API requests made to ABS.
DP-203: Exam Q&A Series – Part 5
54
You have a partitioned table in an Azure Synapse Analytics dedicated SQL pool. You need to design
queries to maximize the benefits of partition elimination. What should you include in the Transact-
SQL queries?
a) JOIN
b) WHERE
c) DISTINCT
d) GROUP BY

When you add the "WHERE" clause to your T-SQL query it allows the query optimizer
accesses only the relevant partitions to satisfy the filter criteria of the query - which is
what partition elimination is all about
DP-203: Exam Q&A Series – Part 5
55
You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The
table contains 50 columns and 5 billion rows and is a heap. Most queries against the table
aggregate values from approximately 100 million rows and return only two columns. You discover
that the queries against the fact table are very slow. Which type of index should you add to provide
the fastest query times?
a) nonclustered columnstore
b) clustered columnstore
c) nonclustered
d) clustered

Clustered columnstore indexes are one of the most efficient ways you can store your data in
dedicated SQL pool. Columnstore tables won't benefit a query unless the table has more than
60 million rows.
DP-203: Exam Q&A Series – Part 6
56
You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool. How
should you complete the Transact-SQL statement? To answer, drag the appropriate values to the
correct targets. Each value may be used once, more than once, or not at all. You may need to drag
the split bar between panes or scroll to view content.

Values Answer Area


CLUSTERED INDEX CREATE TABLE table1
COLLATE (
ID INTEGER,
DISTRIBUTION
col1 VARCHAR(10),
PARTITION col2 VARCHAR(10)
PARTITION FUNCTION ) WITH
(
PARTITION SCHEME
DISTRIBUTION = HASH (ID)
PARTITION (ID RANGE LEFT FOR VALUES(1,1000000,2000000))
);
DP-203: Exam Q&A Series – Part 6
57
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must
meet the following requirements:
• Automatically scale down workers when the cluster is underutilized for three minutes.
• Minimize the time it takes to scale to the maximum number of workers.
• Minimize costs.
What should you do first?
a) Enable container services for workspace1.
b) Upgrade workspace1 to the Premium pricing tier.
c) Set Cluster Mode to High Concurrency. We surely need "Optimized Autoscaling" (not
d) Create a cluster policy in workspace1. Standard Autoscaling) which is only part of
Premium Plan. Reason: We need to scale down
after 3 min underutilization and Standard
Autoscaling only allows scaling down after at least
10 minutes.
DP-203: Exam Q&A Series – Part 6
58
You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible
only through an Azure virtual network named VNET1. You are building a SQL pool in Azure Synapse
that will use data from the data lake. Your company has a sales team. All the members of the sales
team are in an Azure Active Directory group named Sales. POSIX controls are used to assign the
Sales group access to the files in the data lake. You plan to load data to the SQL pool every hour.
You need to ensure that the SQL pool can load the sales data from the data lake. Which three
actions should you perform? Each correct answer presents part of the solution. NOTE: Each area
selection is worth one point.
a) Add the managed identity to the Sales group.
b) Use the managed identity as the credentials for the data load process.
c) Create a shared access signature (SAS).
d) Add your Azure Active Directory (Azure AD) account to the Sales group.
e) Use the shared access signature (SAS) as the credentials for the data load process.
f) Create a managed identity.
DP-203: Exam Q&A Series – Part 6
59
You are designing a monitoring solution for a fleet of 500 vehicles. Each vehicle has a GPS
tracking device that sends data to an Azure event hub once per minute. You have a CSV file in an
Azure Data Lake Storage Gen2 container. The file maintains the expected geographical area in
which each vehicle should be. You need to ensure that when a GPS position is outside the
expected area, a message is added to another event hub for processing within 30 seconds. The
solution must minimize cost. What should you include in the solution?
Service Window Analysis Type
An Azure Synapse Analytics Apache Spark pool Hopping Event pattern matching
An Azure Synapse Analytics serverless SQL pool No Window Lagged record comparision
Azure Data Factory Session Point with Polygon
Azure Stream Analytics Tumbling Polygon overlap
DP-203: Exam Q&A Series – Part 6
60
You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure
Data Factory integration runtime would be used in a data copy activity?
a) Azure pipeline
b) Azure-SSIS
c) Azure
d) Self Hosted

When moving data between Azure data platform technologies, the Azure
Integration runtime is used when copying data between two Azure data platform.
DP-203: Exam Q&A Series – Part 6
61
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two
outputs. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 6
62
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one query, and two outputs. Does this
meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 6
63
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input,
two queries, and four outputs. Does this meet the goal?
Yes No

• We need one reference data input for LocationIncomes, which rarely changes.
• We need two queries, one for in-store customers, and one for online customers.
• For each query two outputs is needed. That makes a total of four outputs.
DP-203: Exam Q&A Series – Part 6
64
You have an Azure Data Lake Storage account that contains a staging zone. You need to design a
daily process to ingest incremental data from the staging zone, transform the data by executing an
R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data
to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 6
65
Which Azure Data Factory component contains the transformation logic or the analysis commands
of the Azure Data Factory’s work?
a) Linked Services
b) Datasets
c) Activities
d) Pipelines
DP-203: Exam Q&A Series – Part 6
66
You have an Azure Data Factory that contains 10 pipelines. You need to label each pipeline with its
main purpose of either ingest, transform, or load. The labels must be available for grouping and
filtering when using the monitoring experience in Data Factory. What should you add to each
pipeline?
a) a resource tag
b) a user property
c) an annotation
d) a run group ID
e) a correlation ID

• Annotations are additional, informative tags that you can add to specific
factory resources: pipelines, datasets, linked services, and triggers. By
adding annotations, you can easily filter and search for specific factory
resources.
DP-203: Exam Q&A Series – Part 6
67
You have an Azure Storage account and an Azure SQL data warehouse in the UK South region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data
Factory. The solution must meet the following requirements:
• Ensure that the data remains in the UK South region at all times.
• Minimize administrative effort.
Which type of integration runtime should you use?
a) Azure integration runtime
b) Self-hosted integration runtime
c) Azure-SSIS integration runtime
DP-203: Exam Q&A Series – Part 6
68
You are planning to use Azure Databricks clusters for a single user. Which type of Databricks
cluster should you use?
a) Standard
b) Single Node
c) High Concurrency
DP-203: Exam Q&A Series – Part 6
69
You are planning to use Azure Databricks clusters that provide fine-grained sharing for maximum
resource utilization and minimum query latencies. It should also be a managed cloud resource.
Which type of Databricks cluster should you use?
a) Standard
b) Single Node
c) High Concurrency
DP-203: Exam Q&A Series – Part 6
70
You are planning to use Azure Databricks clusters with no workers and runs Spark jobs on the
driver node. Which type of Databricks cluster should you use?
a) Standard
b) Single Node
c) High Concurrency
DP-203: Exam Q&A Series – Part 7
71
Which Azure Data Factory component orchestrates a transformation job or runs a data movement
command?
a) Linked Services
b) Datasets
c) Activities

Linked Services are objects that are used to define the connection to data stores or
compute resources in Azure.
DP-203: Exam Q&A Series – Part 7
72
You have an Azure virtual machine that has Microsoft SQL Server installed. The server contains a
table named Table1. You need to copy the data from Table1 to an Azure Data Lake Storage Gen2
account by using an Azure Data Factory V2 copy activity.
Which type of integration runtime should you use?
a) Azure integration runtime
b) Self-hosted integration runtime
c) Azure-SSIS integration runtime
DP-203: Exam Q&A Series – Part 7
73
Which browsers are recommended for best use with Azure Databricks?
a) Google Chrome
b) Firefox
c) Safari
d) Microsoft Edge
e) Internet Explorer
f) Mobile browsers
DP-203: Exam Q&A Series – Part 7
74
How do you connect your Spark cluster to the Azure Blob?
a) By calling the .connect() function on the Spark Cluster.
b) By mounting it
c) By calling the .connect() function on the Azure Blob
DP-203: Exam Q&A Series – Part 7
75
How does Spark connect to databases like MySQL, Hive and other data stores?
a) JDBC
b) ODBC
c) Using the REST API Layer

JDBC stands for Java Database Connectivity. It is a Java API for connecting to databases
such as MySQL, Hive, and other data stores. ODBC is not an option, and the REST API
Layer is not available
DP-203: Exam Q&A Series – Part 7
76
You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake
Storage Gen2 container. Which resource provider should you enable?
a) Microsoft.Sql
b) Microsoft.Automation
c) Microsoft.EventGrid
d) Microsoft.EventHub
DP-203: Exam Q&A Series – Part 7
77
You plan to perform batch processing in Azure Databricks once daily. Which Azure Databricks
Cluster should you choose?
a) High Concurrency
b) interactive
c) automated
Azure Databricks has two types of clusters: interactive and automated.
• You use interactive clusters to analyze data collaboratively with interactive notebooks.
• You use automated clusters to run fast and robust automated jobs.
DP-203: Exam Q&A Series – Part 7
78
Which Azure Data Factory component contains the transformation logic or the analysis commands
of the Azure Data Factory’s work?
a) Linked Services
b) Datasets
c) Activities
d) Pipelines
DP-203: Exam Q&A Series – Part 7
79
You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be
stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and
PolyBase in Azure Synapse Analytics. You need to recommend a Stream Analytics data output
format to ensure that the queries from Databricks and PolyBase against the files encounter the
fewest possible errors. The solution must ensure that the files can be queried quickly, and that the
data type information is retained. What should you recommend?
a) JSON
b) Parquet
c) CSV
d) Avro
DP-203: Exam Q&A Series – Part 7
80
You have a self-hosted integration runtime in Azure Data Factory.
The current status of the integration runtime has the following
Lowered The integration
Fail until theruntime
nodehas the following
comes node details:
back online
configurations: • Status: Running • Name: X-M False
Concurrent Jobs (Running/Limit): 2/14 . CPU
• Type: Self-Hosted
High Availability Enabled:
• Status: Running
Utilization: 6% • Version: 4.4.7292.1 • Version: 4.4.7292.1
You are paying • Running
for 14 /concurrent
Registered Node(s):
jobs,1/1
but you • Available Memory: 7697MB
High Availability Enabled: False
are only using•• 2. You are only using 6 % of the
Linked Count: 0
• CPU Utilization: 6%
• Network (In/Out): 1.21KBps/0.83KBps
CPU you have• purchased,
Queue Length: so0 you are paying • Concurrent Jobs (Running/Limit): 2/14
for 94 % that•you Average
do notQueue Duration. 0.00s
use. • Role: Dispatcher/Worker
• Credential Status: In Sync
If the X-M node becomes unavailable, all executed pipelines will:
The number of concurrent jobs and the CPU usage
fail until the node comes back online
indicate that the Concurrent jobs (Running/Limit values
switch to another integration runtime should be:
exceed the CPU limit Raised
Lowered
Left AS-IS
DP-203: Exam Q&A Series – Part 7
81
You have an Azure Databricks resource. You need to log actions that relate to compute changes
triggered by the Databricks resources. Which Databricks services should you log?
a) workspace An Azure Databricks cluster is a set of computation resources and
b) SSH configurations on which you run data engineering, data science, and data
analytics workloads.
c) DBFS
Incorrect Answers:
d) clusters a) An Azure Databricks workspace is an environment for accessing all of your
e) jobs Azure Databricks assets. The workspace organizes objects (notebooks,
libraries, and experiments) into folders, and provides access to data and
computational resources such as clusters and jobs.
b) SSH allows you to log into Apache Spark clusters remotely.
c) Databricks File System (DBFS) is a distributed file system mounted into an
Azure Databricks workspace and available on Azure Databricks clusters.
e) A job is a way of running a notebook or JAR either immediately or on a
scheduled basis.
DP-203: Exam Q&A Series – Part 7
82
Which Azure data platform is commonly used to process data in an ELT framework?
a) Azure Data Factory
b) Azure Databricks
c) Azure Data Lake Storage
DP-203: Exam Q&A Series – Part 7
83
Which Azure service is the best choice to manage and govern your data?
a) Azure Data Factory
b) Azure Purview
c) Azure Data Lake Storage
DP-203: Exam Q&A Series – Part 7
84
Applications that publish messages to Azure Event Hub very frequently will get the best
performance using Advanced Message Queuing Protocol (AMQP) because it establishes a
persistent socket.
True False
DP-203: Exam Q&A Series – Part 7
85
You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a
partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching
table and partition definitions. You need to overwrite the content of the first partition in dbo.Sales
with the content of the same partition in stg.Sales. The solution must minimize load times.
What should you do?
a) Insert the data from stg.Sales into dbo.Sales.
b) Switch the first partition from dbo.Sales to stg.Sales.
c) Switch the first partition from stg.Sales to dbo.Sales.
d) Update dbo.Sales from stg.Sales.
DP-203: Exam Q&A Series – Part 8
86
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will
contain the following three workloads:
• A workload for data engineers who will use Python and SQL
• A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
• A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team identifies the following standards for Databricks environments:
• The data engineers must share a cluster.
• The job cluster will be managed by using a request process whereby data scientists and data
engineers provide packaged notebooks for deployment to the cluster.
• All the data scientists must be assigned their own cluster that terminates automatically after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster
for the data engineers, and a Standard cluster for the jobs. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 8
87
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will
contain the following three workloads:
• A workload for data engineers who will use Python and SQL
• A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
• A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team identifies the following standards for Databricks environments:
• The data engineers must share a cluster.
• The job cluster will be managed by using a request process whereby data scientists and data
engineers provide packaged notebooks for deployment to the cluster.
• All the data scientists must be assigned their own cluster that terminates automatically after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the
data engineers, and a High Concurrency cluster for the jobs. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 8
88
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will
There the
contain is no need for
following a workloads:
three High Concurrency cluster for each data scientist.
• A workload for data engineers who will use Python and SQL
A workload
•Standard for jobs are
clusters that will run notebooks that
recommended for use Python,
a single Spark,
user. Scala, andclusters
Standard SQL can run
A workload developed
•workloads that data scientists
in anywill use to perform
language: ad hocR,analysis
Python, Scala,inandScala and R
SQL.
The enterprise architecture team identifies the following standards for Databricks environments:
• The data engineers must share a cluster.
•A The
highjobconcurrency cluster isbyausing
cluster will be managed managed cloud
a request resource.
process The scientists
whereby data key benefits of high
and data
concurrency clusters
engineers provide are notebooks
packaged that theyforprovide Apache
deployment to the Spark-native
cluster. fine-grained
All the data
•sharing scientists must
for maximum be assigned
resource their ownand
utilization cluster that terminates
minimum query automatically
latencies.after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the
data engineers, and a Standard cluster for the jobs. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 8
89
If an Event Hub goes offline before a consumer group can process the events it holds, those
events will be lost.
True False

Events are persistent.

Each consumer group has its own cursor maintaining its position within the
partition. The consumer groups can resume processing at their cursor
position when the Event Hub is again available.
DP-203: Exam Q&A Series – Part 8
90
You are a Data Engineer for Contoso. You want to view key health metrics of your Stream Analytics
jobs. Which tool in Streaming Analytics should you use?
a) Dashboards
b) Alerts
c) Diagnostics

a) Dashboard are used to view the key health metrics of your Stream Analytics
jobs.
b) Alerts enable proactive detection of issues in Stream Analytics.
c) Diagnostic logging is turned off by default and can help with root-cause
analysis in production deployments.
DP-203: Exam Q&A Series – Part 8
91
You are designing a real-time dashboard solution that will visualize streaming data from remote
sensors that connect to the internet. The streaming data must be aggregated to show the average
value of each 10-second interval. The data will be discarded after being displayed in the
dashboard. The solution will use Azure Stream Analytics and must meet the following
requirements:
- Minimize latency from an Azure Event hub to the dashboard.
- Minimize the required storage.
- Minimize development effort.
What should you include in the solution?
Azure Stream Analytics input type Azure Stream Analytics output type Aggregation Query location
Azure Event Hub Azure Event Hub Azure Event Hub
Azure SQL Database Azure SQL Database Azure SQL Database
Azure steam analytics Azure steam analytics Azure steam analytics
Azure Power BI Azure Power BI Azure Power BI
DP-203: Exam Q&A Series – Part 8
92
Publishers can use either HTTPS or AMQP. AMQP opens a socket and can send multiple messages
over that socket. How many default partitions are available?
a) 1
b) 2
c) 4
d) 8
e) 12

Event Hubs default to 4 partitions.


Partitions are the buckets within an Event Hub. Each publication will
go into only one partition. Each consumer group may read from one
or more than one partition.
DP-203: Exam Q&A Series – Part 8
93
You are designing an enterprise data warehouse in Azure Synapse Analytics that will contain a
table named Customers. Customers will contain credit card information. You need to recommend a
solution to provide salespeople with the ability to view all the entries in Customers. The solution
must prevent all the salespeople from viewing or inferring the credit card information. What should
you include in the recommendation?
a) data masking
b) Always Encrypted
c) column-level security
d) row-level security
DP-203: Exam Q&A Series – Part 8
94
You implement an enterprise data warehouse in Azure Synapse Analytics. You have a large fact
table that is 10 terabytes (TB) in size. Incoming queries use the primary key SaleKey column to
retrieve data as displayed in the following table:
Saleskey CityKey CustomerKey StockItemKey InvoiceDateKey Quantity Unit Price TotalExcludingTax
59301 10123 90 59 22/01/2022 10 15 150
59313 20356 120 59 15/07/2022 15 15 225
59357 10258 150 58 03/05/2022 14 12 168
59756 56203 160 70 09/02/2022 8 15 120
59889 48920 170 70 31/07/2022 20 12 240

You need to distribute the large fact table across multiple nodes to optimize performance of the
table. Which technology should you use?
a) hash distributed table with clustered index
b) hash distributed table with clustered Columnstore index
c) round robin distributed table with clustered index
d) round robin distributed table with clustered Columnstore index
DP-203: Exam Q&A Series – Part 8
95
You have an enterprise data warehouse in Azure Synapse Analytics. Using PolyBase, you create an
external table named [Ext].[Items] to query Parquet files stored in Azure Data Lake Storage Gen2
without importing the data to the data warehouse. The external table has three columns. You
discover that the Parquet files have a fourth column named ItemID. Which command should you
run to add the ItemID column to the external table?

a b

c d
DP-203: Exam Q&A Series – Part 8
96
You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a
complex SELECT query that contains multiple JOIN and CASE statements to transform data for use
in inventory reports. The inventory reports will use the data and additional WHERE parameters
depending on the report. The reports will be produced once daily. You need to implement a
solution to make the dataset available for the reports. The solution must minimize query times.
What should you implement?
a) an ordered clustered columnstore index
b) a materialized view
c) result set caching
d) a replicated table

Materialized views for dedicated SQL pools in Azure Synapse


provide a low maintenance method for complex analytical
queries to get fast performance without any query change.
DP-203: Exam Q&A Series – Part 8
97
You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL
pool. The table has the following specifications:
● Contain sales data for 20,000 products.
● Use hash distribution on a column named ProductID.
● Contain 2.4 billion records for the years 2021 and 2022.
Which number of partition ranges provides optimal compression and performance for the
clustered columnstore index?
a) 40
b) 240
c) 400 The Rule is Partitions:
d) 2400 Records/(1 million * 60)
2,400,000,000/60,000,000 = 40
DP-203: Exam Q&A Series – Part 8
98
You are designing an Azure Synapse Analytics dedicated SQL pool. You need to ensure that you
can audit access to Personally Identifiable Information (PII). What should you include in the
solution?
a) column-level security
b) dynamic data masking
c) row-level security (RLS)
d) sensitivity classifications
DP-203: Exam Q&A Series – Part 8
99
You are designing a security model for an Azure Synapse Analytics dedicated SQL pool that will
support multiple companies. You need to ensure that users from each company can view only the
data of their respective company. Which two objects should you include in the solution? Each
correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
a) a security policy
b) a custom role-based access control (RBAC) role
c) a function
d) a column encryption key
e) asymmetric keys
DP-203: Exam Q&A Series – Part 8
100
You have an Azure Synapse Analytics job that uses Scala. You need to view the status of the job.
What should you do?
a) From Synapse Studio, select the workspace. From Monitor, select SQL requests.
b) From Azure Monitor, run a Kusto query against the AzureDiagnostics table.
c) From Synapse Studio, select the workspace. From Monitor, select Apache Sparks applications.
d) From Azure Monitor, run a Kusto query against the SparkLoggingEvent_CL table.
DP-203: Exam Q&A Series – Part 9
101
You have an Azure Synapse Analytics database, within this, you have a dimension table named
Stores that contains store information. There is a total of 263 stores nationwide. Store information
is retrieved in more than half of the queries that are issued against this database. These queries
include staff information per store, sales information per store and finance information. You want
to improve the query performance of these queries by configuring the table geometry of the stores
table. Which is the appropriate table geometry to select for the stores table?
a) Round Robin a) A Round Robin distribution is a table geometry that is
b) Non-Clustered useful to perform initial data loads.
c) Replicated table b) Non-Clustered is not a valid table geometry in Azure
Synapse Analytics.
c) A replicated table is an appropriate table geometry choice
as the size of the data in the table is less than 200m and
the table will be replicated to every distribution node of
an Azure Synapse Analytics to improve the performance.
DP-203: Exam Q&A Series – Part 9
102
What is the default port for connecting to an enterprise data warehouse in Azure Synapse
Analytics?
a) TCP port 1344
b) UDP port 1433
c) TCP port 1433
DP-203: Exam Q&A Series – Part 9
103
You have the following Azure Stream Analytics query. True / False
Statement Yes No
The query combines two streams of partitioned
data.
The stream scheme key and count must match
the output scheme
Providing 60 streaming units will optimize the
performance of the query.
Streaming Units
now(SUs)
You canjoining
When tworepresents
use astreams the
ofcomputing
new extension data of resources
explicitly
Azure that are
Stream
allocated to execute a Stream Analytics job. The higher the number of
Analytics SQL to
repartitioned, these
specify
streams
the number
must haveof partitions
the same of a
SUs, the more CPU and memory resources are allocated for your job.
stream
partition
In general,when
key
the and
reshuffling
best partition
practice is tothecount.
startdata.
with 6 SUs for queries that don't
use PARTITION BY.
Here there are 10 partitions, so 6x10 = 60 SUs is good.
DP-203: Exam Q&A Series – Part 9
104
You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by
using the following Transact-SQL statement.
You need to alter the table to meet the following
requirements:
i. Ensure that users can identify the current
manager of employees.
ii. Support creating an employee reporting
hierarchy for your entire company.
iii. Provide fast lookup of the managers'
attributes such as name and job title.

Which column should you add to the table?


a) [ManagerEmployeeID] [smallint] NULL
b) [ManagerEmployeeKey] [smallint] NULL
c) [ManagerEmployeeKey] [int] NULL
d) [ManagerName] [varchar](200) NULL
DP-203: Exam Q&A Series – Part 9
105
You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an
Azure Synapse Analytics dedicated SQL pool. You have a table that was created by using the
following Transact-SQL statement.
Which two columns should you add to the table?
Each correct answer presents part of the solution?
a) [EffectiveStartDate] [datetime] NOT NULL,
b) [CurrentProductCategory] [nvarchar] (100) NOT NULL,
c) [EffectiveEndDate] [datetime] NULL,
d) [ProductCategory] [nvarchar] (100) NOT NULL,
e) [OriginalProductCategory] [nvarchar] (100) NOT NULL,
DP-203: Exam Q&A Series – Part 9
106
You have a SQL pool in Azure Synapse. You plan to load data from Azure Blob storage to a staging
table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before
each daily load. You need to create the staging table. The solution must minimize how long it takes
to load the data to the staging table. How should you configure the table? To answer, select the
appropriate options in the answer area.
Distribution Indexing Partitioning
Hash Clustered Date
Replicated Clustered Columnstore None
Round Robin Heap
DP-203: Exam Q&A Series – Part 9
107
You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as
shown in the following exhibit.
/topfolder/

File1.csv /Folder1/ /Folder2/ File4.csv

File2.csv File3.csv

You create an external table named ExtTable that has LOCATION='/topfolder/’. When you query
ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned?
a) File2.csv and File3.csv only
b) File1.csv and File4.csv only
c) File1.csv, File2.csv, File3.csv, and File4.csv
d) File1.csv only
DP-203: Exam Q&A Series – Part 9
108
You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics.
SalesFact contains sales data from the past 36 months and has the following characteristics:
a) Is partitioned by month b) Contains one billion rows c) Has clustered columnstore indexes
Beginning of each month, you need to remove data from SalesFact that is older than 36 months as
quickly as possible. Which three actions should you perform in sequence in a stored procedure?
Switch the partition containing the stale data
from SalesFact to SalesFact Work.
Create an empty table named SalesFact_Work
Truncate the partition containing the stale data
that has the same schema as SalesFact.
Drop the SalesFact_Work table.
Switch the partition containing the stale data
Create an empty table named SalesFact_Work from SalesFact to SalesFact_Work.
that has the same schema as SalesFact.
Drop the SalesFact_Work table.
Execute a DELETE statement where the value in
the Date column is more than 36 months ago.
Copy the data to a new table by using CREATE
TABLE AS SELECT (CTAS).
DP-203: Exam Q&A Series – Part 9
109
You develop data engineering solutions for a company. A project requires analysis of real-time
Twitter feeds. Posts that contain specific keywords must be stored and processed on Microsoft
Azure and then displayed by using Microsoft Power BI. You need to implement the solution. Which
five actions should you perform in sequence?

Create an HDInsight cluster with the Hadoop cluster type.

Create a Jupyter Notebook Create an HDInsight cluster with the Spark cluster type
Run a job that uses the spark streaming API to ingest data Create a Jupyter Notebook
from Twitter
Create a Runbook Create a table

Create an HDInsight cluster with the Spark cluster type Run a job that uses the spark streaming API to ingest data
from Twitter
Create a table
Load the hvac table to Power BI Desktop
Load the hvac table to Power BI Desktop
DP-203: Exam Q&A Series – Part 9
110
You have an Azure SQL database named DB1 in the East US 2 region. You need to build a
secondary geo-replicated copy of DB1 in the West US region on a new server. Which three actions
should you perform in sequence?

Implement log shipping

On the secondary server create logins that match the SIDs on From the Geo replication settings of the DB1 select West US
the primary server

Create a target server and select a pricing tier Create a target server and select a pricing tier

On the secondary server create logins that match the SIDs on


Set the quorum mode and create a failover policy
the primary server

From the Geo replication settings of the DB1 select West US


DP-203: Exam Q&A Series – Part 9
111
You need to create an Azure Cosmos DB account that will use encryption keys managed by your
organization. Which four actions should you perform in sequence?

Generate a new key in the Azure Key vault


Create an Azure Key vault and enable purge protection
Create an Azure Key vault and enable purge protection
Create a new Azure Cosmos DB account set Data Encryption
Create a new Azure Cosmos DB account and set Data to Customer managed key (Enter key URI) and enter the key
Encryption to Service Managed Key URI
Add an Azure Key vault access policy to grant permissions to Add an Azure Key vault access policy to grant permissions to
the Azure Cosmos DB principal the Azure Cosmos DB principal
Create a new Azure Cosmos DB account set Data Encryption
Generate a new key in the Azure Key vault
to Customer managed key (Enter key URI) and enter the key
URI
DP-203: Exam Q&A Series – Part 9
112
You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two
reports that will access the data lake:
• Report1: Reads three columns from a file that contains 50 columns.
• Report2: Queries a single record based on a timestamp.
You need to recommend in which format to store the data in the data lake to support the reports.
The solution must minimize read times. What should you recommend for each report? To answer,
select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Report 1 Report 2
Avro Avro
CSV CSV
Parquet Parquet
TSV TSV
DP-203: Exam Q&A Series – Part 9
113
How long is the Recovery Point Objective for Azure Synapse Analytics?
a) 4 hours
b) 8 hours
c) 12 hours
d) 16 hours
DP-203: Exam Q&A Series – Part 9
114
You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named
Server1. You need to verify whether the size of the transaction log file for each distribution of DW1
is smaller than 160 GB. What should you do?
a) On the master database, execute a query against the
sys.dm_pdw_nodes_os_performance_counters dynamic management view.
b) From Azure Monitor in the Azure portal, execute a query against the logs of DW1.
c) On DW1, execute a query against the sys.database_files dynamic management view.
d) Execute a query against the logs of DW1 by using the Get-AzOperationalInsightSearchResult
PowerShell cmdlet.
DP-203: Exam Q&A Series – Part 9
115
You have an enterprise data warehouse in Azure Synapse Analytics. You need to monitor the data
warehouse to identify whether you must scale up to a higher service level to accommodate the
current workloads. Which is the best metric to monitor? More than one answer choice may achieve
the goal. Select the BEST answer.
a) CPU percentage
b) DWU used
c) DWU percentage
d) Data IO percentage
DP-203: Exam Q&A Series – Part 10
116
You are a data architect. The data engineering team needs to configure a synchronization of data
between an on-premises Microsoft SQL Server database to Azure SQL Database. Ad-hoc and
reporting queries are being overutilized the on-premises production instance. The synchronization
process must:
• Perform an initial data synchronization to Azure SQL Database with minimal downtime
• Perform bi-directional data synchronization after initial synchronization
You need to implement this synchronization solution. Which synchronization method should you
use?
a) transactional replication
b) Data Migration Assistant (DMA)
c) backup and restore
d) SQL Server Agent job
e) Azure SQL Data Sync
DP-203: Exam Q&A Series – Part 10
117
You have an Azure subscription that contains an Azure Storage account. You plan to implement
changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100
days.
Solution: You schedule an Azure Data Factory pipeline with a delete activity. Does this meet the
goal?
Yes No
DP-203: Exam Q&A Series – Part 10
118
You have an Azure subscription that contains an Azure Storage account. You plan to implement
changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100
days.
Solution: You apply an expired tag to the blobs in the storage account. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 10
119
You have an Azure subscription that contains an Azure Storage account. You plan to implement
changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100
days.
Solution: You apply an Azure Blob storage lifecycle policy. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 10
120
You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK
South region. You need to copy blob data from the storage account to the data warehouse by using
Azure Data Factory. The solution must meet the following requirements:
• Ensure that the data remains in the UK South region at all times.
• Minimize administrative effort.
Which type of integration runtime should you use?
a) Azure integration runtime
b) Azure-SSIS integration runtime
c) Self-hosted integration runtime
DP-203: Exam Q&A Series – Part 10
121
You want to ingest data from a SQL Server database hosted on an on-premises Windows Server.
What integration runtime is required for Azure Data Factory to ingest data from the on-premises
server?
a) Azure integration runtime
b) Azure-SSIS integration runtime
c) Self-hosted integration runtime
DP-203: Exam Q&A Series – Part 10
122
By default, how long are the Azure Data Factory diagnostic logs retained for?
a) 15 days
b) 30 days
c) 45 days
DP-203: Exam Q&A Series – Part 10
123
You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake
Storage Gen2 container. Which resource provider should you enable?
a) Microsoft.Sql
b) Microsoft.Automation
c) Microsoft.EventGrid
d) Microsoft.EventHub

Event-driven architecture (EDA) is a common data integration pattern that involves


production, detection, consumption, and reaction to events. Data integration scenarios
often require Data Factory customers to trigger pipelines based on events happening in
storage account, such as the arrival or deletion of a file in Azure Blob Storage account.

Data Factory natively integrates with Azure Event Grid, which lets you trigger pipelines on
such events.
DP-203: Exam Q&A Series – Part 10
124
You have an Azure Data Factory instance that contains two pipelines named Pipeline1 & Pipeline2.
Pipeline1 has the activities shown in the following exhibit. Pipeline2 has the activities shown in the following exhibit.

You execute Pipeline2, and Stored procedure1 in Pipeline1 fails. What is the status of the pipeline
runs?
a) Pipeline1 and Pipeline2 succeeded.
b) Pipeline1 and Pipeline2 failed.
c) Pipeline1 succeeded and Pipeline2 failed.
d) Pipeline1 failed and Pipeline2 succeeded.
DP-203: Exam Q&A Series – Part 10
125
You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The
table will have a clustered columnstore index and will include the following columns:
• TransactionType: 40 million rows per transaction type
• CustomerSegment: 4 million per customer segment
• TransactionMonth: 65 million rows per month
• AccountType: 500 million per account type
You have the following query requirements:
• Analysts will most commonly analyze transactions for a given month.
• Transactions analysis will typically summarize transactions by transaction type, customer segment,
and/or account type
You need to recommend a partition strategy for the table to minimize query times. On which column should
you recommend partitioning the table?

a) CustomerSegment
b) AccountType
c) TransactionType
d) TransactionMonth
DP-203: Exam Q&A Series – Part 10
126
Your company wants to route data rows to different streams based on matching conditions. Which
transformation in the Mapping Data Flow should you use?
a) Conditional Split
b) Select
c) Lookup
A Conditional Split transformation routes data rows to different streams based on matching
conditions. The conditional split transformation is like a CASE decision structure in a
programming language.

A Lookup transformation is used to add reference data from another source to your Data
Flow.
DP-203: Exam Q&A Series – Part 10
127
Which transformation is used to load data into a data store or compute resource?
a) Source
b) Destination
c) Sink
d) Window
A Sink transformation allows you to choose a dataset definition for the destination output data.
You can have as many sink transformations as your data flow requires.

A Window transformation is where you will define window-based aggregations of columns in


your data streams.
DP-203: Exam Q&A Series – Part 10
128
A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution
uses Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data.
The cloud job is configured to use 120 Streaming Units (SU). You need to optimize performance for
the Azure Stream Analytics job. Which two actions should you perform? Each correct answer
presents part of the solution. NOTE: Each correct selection is worth one point.
a) Implement event ordering.
b) Implement Azure Stream Analytics user-defined functions (UDF).
c) Implement query parallelization by partitioning the data output.
d) Scale the SU count for the job up.
e) Scale the SU count for the job down.
f) Implement query parallelization by partitioning the data input.
DP-203: Exam Q&A Series – Part 10
129
By default, how are corrupt records dealt with using spark.read.json()?
a) They appear in a column called "_corrupt_record"
b) They get deleted automatically
c) They throw an exception and exit the read operation
DP-203: Exam Q&A Series – Part 10
130
How do you specify parameters when reading data?
a) Using .option() during your read allows you to pass key/value pairs specifying aspects of your
read
b) Using .parameter() during your read allows you to pass key/value pairs specifying aspects of
your read
c) Using .keys() during your read allows you to pass key/value pairs specifying aspects of your
read
DP-203: Exam Q&A Series – Part 11
131
You create an Azure Databricks cluster and specify an additional library to install. When you
attempt to load the library to a notebook, the library in not found. You need to identify the cause of
the issue. What should you review?
a) notebook logs
b) cluster event logs
c) global init scripts logs
d) workspace logs
DP-203: Exam Q&A Series – Part 11
132
Your company analyzes images from security cameras and sends alerts to security teams that
respond to unusual activity. The solution uses Azure Databricks. You need to send Apache Spark
level events, Spark Structured Streaming metrics, and application metrics to Azure Monitor. Which
three actions should you perform in sequence?

Create a data source in Azure Monitor.


Configure the Databricks cluster to use the Databricks
Configure the Databricks cluster to use the Databricks monitoring library
monitoring library

Deploy Grafana to an Azure virtual machine Build a spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR


file.
Build a spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR
file.
Create Dropwizard counters in the application code
Create Dropwizard counters in the application code
DP-203: Exam Q&A Series – Part 11
133
You have an Azure Data Lake Storage Gen2 account that contains JSON files for customers. The
files contain two attributes named FirstName and LastName. You need to copy the data from the
JSON files to an Azure Synapse Analytics table by using Azure Databricks. A new column must be
created that concatenates the FirstName and LastName values. You create the following
components:
- A destination table in Azure Synapse
- An Azure Blob storage container
- A service principal
Which five actions should you perform in sequence next in a Databricks notebook?

Specify a temporary folder to stage the data 4 Write the results to a table in Azure Synapse 5

Write the results to Data Lake Storage Perform transformations on the data frame 3

Drop the data frame Mount the Data Lake Storage onto DBFS 1

Read the file into a data frame 2 Perform transformations on the file
DP-203: Exam Q&A Series – Part 11
134
You are designing an Azure Databricks interactive cluster. You need to ensure that the cluster
meets the following requirements:
- Enable auto-termination
- Retain cluster configuration indefinitely after cluster termination.
What should you recommend?
a) Start the cluster after it is terminated.
b) Pin the cluster
c) Clone the cluster after it is terminated.
d) Terminate the cluster manually at process completion.
DP-203: Exam Q&A Series – Part 11
135
You are designing an Azure Databricks table. The table will ingest an average of 20 million
streaming events per day. You need to persist the events in the table for use in incremental load
pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load
times. What should you include in the solution?
a) Partition by DateTime fields.
b) Sink to Azure Queue storage.
c) Include a watermark column.
d) Use a JSON format for physical data storage.
DP-203: Exam Q&A Series – Part 11
136
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must
meet the following requirements:
- Automatically scale down workers when the cluster is underutilized for three minutes.
- Minimize the time it takes to scale to the maximum number of workers.
- Minimize costs.
What should you do first?
a) Enable container services for workspace1.
b) Upgrade workspace1 to the Premium pricing tier.
c) Set Cluster Mode to High Concurrency.
d) Create a cluster policy in workspace1.
DP-203: Exam Q&A Series – Part 11
137
You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The
size of the files will vary based on the number of events that occur per hour. File sizes range from
4 KB to 5 GB. You need to ensure that the files stored in the container are optimized for batch
processing. What should you do?
a) Convert the files to JSON
b) Convert the files to Avro
c) Compress the files
d) Merge the files
DP-203: Exam Q&A Series – Part 11
138
You are planning a solution to aggregate streaming data that originates in Apache Kafka and is
output to Azure Data Lake Storage Gen2. The developers who will implement the stream
processing solution use Java. Which service should you recommend using to process the
streaming data?
a) Azure Event Hubs
b) Azure Data Factory
c) Azure Stream Analytics
d) Azure Databricks
DP-203: Exam Q&A Series – Part 11
139
You need to implement an Azure Databricks cluster that automatically connects to Azure Data
Lake Storage Gen2 by using Azure Active Directory (Azure AD) integration. How should you
configure the new cluster?
Tier Advanced option to enable
Premium Azure Data Lake Storage Credential Passthrough
Standard Table access control

Credential passthrough requires an Azure Databricks You can access Azure Data Lake Storage using Azure
Premium Plan Active Directory credential passthrough.
When you enable your cluster for Azure Data Lake
Storage credential passthrough, commands that you
run on that cluster can read and write data in Azure
Data Lake Storage without requiring you to configure
service principal credentials for access to storage.
DP-203: Exam Q&A Series – Part 11
140
Which Azure Data Factory process involves using compute services to produce data to feed
production environments with cleansed data?
a) Connect and collect
b) Transform and enrich
c) Publish
d) Monitor
DP-203: Exam Q&A Series – Part 11
141
You have a new Azure Data Factory environment. You need to periodically analyze pipeline
executions from the last 60 days to identify trends in execution durations. The solution must use
Azure Log Analytics to query the data and create charts. Which diagnostic settings should you
configure in Data Factory? To answer, select the appropriate options in the answer area.
Log Type Storage Location
ActivityRuns An Azure event hub
AllMetrics An Azure storage account
PipelineRuns Azure Log Analytics
TriggerRuns
DP-203: Exam Q&A Series – Part 11
142
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL
pool. You create a table by using the Transact-SQL statement shown in the following exhibit.
Use the drop-down menus to select the answer
choice that completes each statement based
on the information presented in the graphic.
DimProduct is a [answer choice] slowly changing
dimension (SCD)
Type 0
Type 1
Type 2
Advanced option to enable
A surrogate key
A business key
An audit column
DP-203: Exam Q&A Series – Part 11

142 Explanation

Type 2 SCD: supports versioning of dimension members. Often the source system doesn't store
versions, so the data warehouse load process detects and manages changes in a dimension
table. In this case, the dimension table must use a surrogate key to provide a unique reference
to a version of the dimension member. It also includes columns that define the date range
validity of the version.

Business key: A business key or natural key is an index which identifies uniqueness of a row
based on columns that exist naturally in a table according to business rules. For example,
business keys are customer code in a customer table, composite of sales order header number
and sales order item line number within a sales order details table.
Reference:
DP-203: Exam Q&A Series – Part 11
143
You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure
Data Lake Storage Gen2 container. Which type of trigger should you use?
a) on-demand
b) tumbling window
c) schedule
d) event
DP-203: Exam Q&A Series – Part 11
144
You have two Azure Data Factory instances named ADFdev and ADFprod. ADFdev connects to an
Azure DevOps Git repository. You publish changes from the main branch of the Git repository to
ADFdev. You need to deploy the artifacts from ADFdev to ADFprod. What should you do first?
a) From ADFdev, modify the Git configuration.
b) From ADFdev, create a linked service.
c) From Azure DevOps, create a release pipeline.
d) From Azure DevOps, update the main branch.
DP-203: Exam Q&A Series – Part 11
145
You have an Azure data factory. You need to examine the pipeline failures from the last 60 days.
What should you use?
a) the Activity log blade for the Data Factory resource
b) the Monitor & Manage app in Data Factory
c) the Resource health blade for the Data Factory resource
d) Azure Monitor
DP-203: Exam Q&A Series – Part 12
146
Your company is building a Datawarehouse where they want to keep track of changes in customer
mailing address. You want to keep the current mailing address and the previous one. Which SCD
type should you use?
a) Type 1 SCD
b) Type 2 SCD
c) Type 3 SCD
d) Type 6 SCD
DP-203: Exam Q&A Series – Part 12
147
Your company is building a Datawarehouse where they want to keep only the latest vendor’s
company name from whom your company purchases raw materials. Which SCD type should you
use?
a) Type 1 SCD
b) Type 2 SCD
c) Type 3 SCD
d) Type 6 SCD
DP-203: Exam Q&A Series – Part 12
148
Your company is building a Datawarehouse where they want to keep track of changes in customer
mailing address. You want to keep the current mailing address and the previous one. Both new and
old mailing address should be stored as different rows. Which SCD type should you use?
a) Type 1 SCD
b) Type 2 SCD
c) Type 3 SCD
d) Type 6 SCD
DP-203: Exam Q&A Series – Part 12

Cheat Sheet
Type Use Case
Type 1 SCD When you want to maintain the latest value of Record. Each record will always have one
row
Type 2 SCD Maintain version of the record using columns that define the date range validity of the
version (for example, StartDate and EndDate) and possibly a flag column (for example,
IsCurrent) to easily filter by current dimension members. Different rows
Type 3 SCD When you maintain two versions of a dimension member as separate columns. It uses
additional columns to track one key instance of history, rather than storing additional
rows to track each change like in a Type 2 SCD
DP-203: Exam Q&A Series – Part 12
149
You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write
the results to Azure Blob storage. You need to calculate the difference in readings per sensor per
hour. How should you complete the query?

SELECT sensorId,
growth = reading (reading) OVER (PARTITION BY sensorId (hour, 1)) FROM input
LAG LIMIT
DURATION
LAST
OFFSET
LEAD
WHEN
DP-203: Exam Q&A Series – Part 12
150
You have an Azure Synapse Analytics dedicated SQL pool. You need to ensure that data in the pool
is encrypted at rest. The solution must NOT require modifying applications that query the data.
What should you do?
a) Enable encryption at rest for the Azure Data Lake Storage Gen2 account.
b) Enable Transparent Data Encryption (TDE) for the pool.
c) Use a customer-managed key to enable double encryption for the Azure Synapse workspace.
d) Create an Azure key vault in the Azure subscription grant access to the pool.

Transparent Data Encryption (TDE) helps protect against the threat of malicious activity by
encrypting and decrypting your data at rest. When you encrypt your database, associated
backups and transaction log files are encrypted without requiring any changes to your
applications. TDE encrypts the storage of an entire database by using a symmetric key called the
database encryption key.
DP-203: Exam Q&A Series – Part 12
151
You have an Azure subscription that contains a logical Microsoft SQL server named Server1.
Server1 hosts an Azure Synapse Analytics SQL dedicated pool named Pool1. You need to
recommend a Transparent Data Encryption (TDE) solution for Server1. The solution must meet the
following requirements:
- Track the usage of encryption keys.
- Maintain the access of client apps to Pool1 in the event of an Azure datacenter outage that
affects the availability of the encryption keys.
What should you include in the recommendation?
To Track encryption key usage To maintain client app access in the event of a
Always Encrypted datacenter outage
TDE with customer-managed keys Create and configure Azure key vaults in two Azure
regions
TDE with platform-managed keys
Enable Advanced Data security on server1
Implement the client apps by using a Microsoft .NET
Framework data provider
DP-203: Exam Q&A Series – Part 12
152
You plan to create an Azure Synapse Analytics dedicated SQL pool. You need to minimize the time
it takes to identify queries that return confidential information as defined by the company's data
privacy regulations and the users who executed the queues. Which two components should you
include in the solution?
a) sensitivity-classification labels applied to columns that contain confidential information
b) resource tags for databases that contain confidential information
c) audit logs sent to a Log Analytics workspace
d) dynamic data masking for columns that contain confidential information
DP-203: Exam Q&A Series – Part 12
153
While using Azure Data Factory you want to parameterize a linked service and pass dynamic values
at run time. Which supported connector should you use?
a) Azure Data Lake Storage Gen2
b) Azure Data Factory variables
c) Azure Synapse Analytics
d) Azure Key Vault
DP-203: Exam Q&A Series – Part 12
154
Which file formats Azure Data Factory support?
a) Avro format
b) Binary format
c) Delimited text format
d) Excel format
e) JSON format
f) ORC format
g) Parquet format
h) XML format
i) ALL OF THE ABOVE
DP-203: Exam Q&A Series – Part 12
155
Which property indicates the parallelism, you want the copy activity to use?
a) parallelCopies
b) stagedCopies
c) multiCopies
DP-203: Exam Q&A Series – Part 12
156
Using the Azure Data Factory user interface (UX) you want to create a pipeline that copies and
transforms data from an Azure Data Lake Storage (ADLS) Gen2 source to an ADLS Gen2 sink using
mapping data flow. Choose the correct steps in right order.
a) Create a data factory account
b) Create a data factory. 1
c) Create a copy activity
d) Create a pipeline with a Data Flow activity. 2
e) Validate copy activity
f) Build a mapping data flow with four transformations. 3
g) Test run the pipeline. 4
h) Monitor a Data Flow activity 5
DP-203: Exam Q&A Series – Part 12
157
In Azure Data Factory: What is an example of a branching activity used in control flows?
a) The If-condition
b) Until-condition
c) Lookup-condition
DP-203: Exam Q&A Series – Part 12
158
Which activity can retrieve a dataset from any of the data sources supported by data factory and
Synapse pipelines?
a) Find activity
b) Lookup activity
c) Validate activity
DP-203: Exam Q&A Series – Part 12
159
You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a
complex SELECT query that contains multiple JOIN and CASE statements to transform data for use
in inventory reports. The inventory reports will use the data and additional WHERE parameters
depending on the report. The reports will be produced once daily. You need to implement a
solution to make the dataset available for the reports. The solution must minimize query times.
What should you implement?
a) an ordered clustered columnstore index
b) a materialized view
c) result set caching
d) a replicated table
DP-203: Exam Q&A Series – Part 12
160
Which Azure service should you use to provide customer-facing reports, dashboards, and analytics
in your own applications
a) Azure reports
b) Azure Power BI
c) Azure Monitor
DP-203: Exam Q&A Series – Part 13
161
You have an Azure subscription that contains an Azure Storage account. You plan to implement
changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100
days.
Solution: You apply an expired tag to the blobs in the storage account. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 13
162
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text
and numerical values. 75% of the rows contain description data that has an average length of 1.1
MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You copy the files to a table that has a columnstore index. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 13
163
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text
and numerical values. 75% of the rows contain description data that has an average length of 1.1
MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is more than 1 MB. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 13
164
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text
and numerical values. 75% of the rows contain description data that has an average length of 1.1
MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You convert the files to compressed delimited text files.

Yes No
DP-203: Exam Q&A Series – Part 13
165
You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool
named Pool1. You plan to create a database named DB1 in Pool1. You need to ensure that when
tables are created in DB1, the tables are available automatically as external tables to the built-in
serverless SQL pool. Which format should you use for the tables in DB1?
a) CSV
b) ORC
c) JSON
d) Parquet
DP-203: Exam Q&A Series – Part 13
166
You are planning a solution to aggregate streaming data that originates in Apache Kafka and is
output to Azure Data Lake Storage Gen2. The developers who will implement the stream
processing solution use Java. Which service should you recommend using to process the
streaming data?
a) Azure Event Hubs
b) Azure Data Factory
c) Azure Stream Analytics
d) Azure Databricks
DP-203: Exam Q&A Series – Part 13
167
You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse
Analytics dedicated SQL pool. You plan to keep a record of changes to the available fields.
The supplier data contains the following columns.
Name Which three additional columns should you add
SupplierSystemID SupplierAddress1 to the data to create a Type 2?
SupplierName SupplierAddress2
SupplierDescription SupplierCity
a) surrogate primary key
SupplierCategory SupplierCountry b) effective start date
SupplierPostalCode c) business key
d) last modified date
e) effective end date
f) foreign key
DP-203: Exam Q&A Series – Part 13
168
You have a Microsoft SQL Server database that uses a third normal form schema. You plan to
migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQL
pool. You need to design the dimension tables. The solution must optimize read operations. What
should you include in the solution?
Transform data for dimension tables by For primary key columns in dimension tables use
Maintaining to a third normal form New IDENTITY columns
Normalizing to a fourth normal form A new computed columns
Denormalizing to a second normal form The business key column from the source system

Denormalization is the process of transforming The collapsing relations strategy can be used in this step
higher normal forms to lower normal forms via to collapse classification entities into component entities
storing the join of higher normal form relations as a to obtain flat dimension tables with single-part keys that
base relation. Denormalization increases the connect directly to the fact table. The single-part key is a
performance in data retrieval at cost of bringing surrogate key generated to ensure it remains unique over
update anomalies to a database. time.
DP-203: Exam Q&A Series – Part 13
169
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL
pool. You create a table by using the Transact-SQL statement shown in the following exhibit.
Use the drop-down menus to select the answer
choice that completes each statement based on
the information presented in the graphic.
DimProduct is a ---- slowly changing dimension (SCD)
Type 1
Type 2
Type 3
The ProductKey column is ----
a surrogate key
A business key
An audit column
DP-203: Exam Q&A Series – Part 13
170
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL
pool. You create a table by using the Transact-SQL statement shown in the following exhibit.
Which two columns should you add to the table
so that the table supports storing two versions
of a dimension member as separate columns?
Each correct answer presents part of the
solution?
a) [EffectiveStartDate] [datetime] NOT NULL,
b) [CurrentProductCategory] [nvarchar] (100) NOT
NULL,
c) [EffectiveEndDate] [datetime] NULL,
d) [ProductCategory] [nvarchar] (100) NOT NULL,
e) [OriginalProductCategory] [nvarchar] (100) NOT
NULL,
DP-203: Exam Q&A Series – Part 13
171
You are designing a data mart for the human resources (HR) department at your company. The
data mart will contain employee information and employee transactions. From a source system,
you have a flat extract that has the following fields:
● EmployeeID You need to design a star schema data model in an Azure Synapse
● FirstName
● LastName
Analytics dedicated SQL pool for the data mart. Which two tables
● Recipient should you create?
● GrossAmount a) a dimension table for Transaction
● TransactionID
● GovernmentID b) a dimension table for EmployeeTransaction
● NetAmountPaid c) a dimension table for Employee
● TransactionDate
d) a fact table for Employee
e) a fact table for Transaction
DP-203: Exam Q&A Series – Part 13
172
You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL
pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the
following columns.
FactPurchase will have 1 million rows of data added
daily and will contain three years of data.
Transact-SQL queries similar to the following query
will be executed daily.
SELECT SupplierKey, StockItemKey,
IsOrderFinalized, COUNT(*) FROM FactPurchase
WHERE DateKey >= 20210101 AND DateKey <=
20210131 GROUP By SupplierKey, StockItemKey,
IsOrderFinalized

Which table distribution will minimize query times?

a) replicated c) round-robin
b) hash-distributed on PurchaseKey d) hash-distributed on IsOrderFinalized
DP-203: Exam Q&A Series – Part 13
173
You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool. You need
to create a surrogate key for the table. The solution must provide the fastest query performance.
What should you use for the surrogate key?
a) a GUID column
b) a sequence object
c) an IDENTITY column
DP-203: Exam Q&A Series – Part 13
174
You are implementing a batch dataset in the Parquet format. Data files will be produced be using
Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an
Azure Synapse Analytics serverless SQL pool. You need to minimize storage costs for the solution.
What should you do?
a) Use Snappy compression for the files.
b) Use OPENROWSET to query the Parquet files.
c) Create an external table that contains a subset of columns from the Parquet files.
d) Store all data as string in the Parquet files.
DP-203: Exam Q&A Series – Part 13
175
Which Azure Data Factory component contains the transformation logic or the analysis commands
of the Azure Data Factory’s work?
a) Linked Services
b) Datasets
c) Activities
d) Pipelines
• Linked Services are objects that are used to define the connection to data stores or
compute resources in Azure.
• Datasets represent data structures within the data store that is being referenced
by the Linked Service object.
• Activities contains the transformation logic or the analysis commands of the Azure
Data Factory’s work.
• Pipelines are a logical grouping of activities.
DP-203: Exam Q&A Series – Part 14
176
You have an Azure subscription that contains an Azure Blob Storage account named storage1 and
an Azure Synapse Analytics dedicated SQL pool named Pool1. You need to store data in storage1.
The data will be read by Pool1. The solution must meet the following requirements:
• Enable Pool1 to skip columns and rows that are unnecessary in a query.
• Automatically create column statistics.
• Minimize the size of files.
Which type of file should you use?
a) JSON
b) Parquet
c) Avro
d) CSV
DP-203: Exam Q&A Series – Part 14
177
You plan to create a table in an Azure Synapse Analytics dedicated SQL pool. Data in the table will
be retained for five years. Once a year, data that is older than five years will be deleted. You need
to ensure that the data is distributed evenly across partitions. The solution must minimize the
amount of time required to delete old data. How should you complete the Transact-SQL statement?
a) CustomerKey
b) Hash
c) Round_Robin
d) Replicate
e) OrderDateKey
f) SalesOrderNumber
Hash

OrderDateKey
DP-203: Exam Q&A Series – Part 14
178
You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one
container and has the hierarchical namespace enabled. The system has files that contain data
stored in the Apache Parquet format. You need to copy folders and files from Storage1 to Storage2
by using a Data Factory copy activity. The solution must meet the following requirements:
• No transformations must be performed.
• The original folder structure must be retained.
• Minimize time required to perform the copy activity.
How should you configure the copy activity?
Source Dataset Type Copy activity copy behavior
Binary FlattenHierarchy
Paraquet Merge Files
Delimited Text PreserveHierarchy
DP-203: Exam Q&A Series – Part 14
179
You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use
in an analytical workload. You need to recommend a format for the transformed files. The solution
must meet the following requirements:
• Contain information about the data types of each column in the files.
• Support querying a subset of columns in the files.
• Support read-heavy analytical workloads.
• Minimize the file size.
What should you recommend?
a) JSON
b) CSV
c) Apache Avro
d) Apache Parquet
DP-203: Exam Q&A Series – Part 14
180
From a website analytics system, you receive data extracts about user interactions such as
downloads, link clicks, form submissions, and video plays. Data contains the following columns.
You need to design a star schema to support analytical
queries of the data. The star schema will contain four tables
including a date dimension.
To which table should you add each column? To answer,
select the appropriate options in the answer area.

EventCategory ChannelGrouping TotalEvents


DimChannel DimChannel DimChannel
DimDate DimDate DimDate
DimEvents DimEvents DimEvents
FactEvents FactEvents FactEvents
DP-203: Exam Q&A Series – Part 14
181
A company purchases IoT devices to monitor manufacturing machinery. The company uses an IoT
appliance to communicate with the IoT devices. The company must be able to monitor the devices
in real-time. You need to design the solution. What should you recommend?
a) Azure Data Factory instance using Azure PowerShell
b) Azure Analysis Services using Microsoft Visual Studio
c) Azure Stream Analytics cloud job using Azure PowerShell
d) Azure Data Factory instance using Microsoft Visual Studio

Stream Analytics is a cost-effective event processing engine that helps uncover real-
time insights from devices, sensors, infrastructure, applications and data quickly and
easily. You can monitor and manage Stream Analytics resources with Azure PowerShell
cmdlets and powershell scripting that execute basic Stream Analytics tasks.
DP-203: Exam Q&A Series – Part 14
182
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts.
Contacts contains a column named Phone. You need to ensure that users in a specific role only
see the last four digits of a phone number when querying the Phone column. What should you
include in the solution?
a) column encryption
b) dynamic data masking
c) a default value
d) table partitions
e) row level security (RLS)
DP-203: Exam Q&A Series – Part 14
183
You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be
stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and
PolyBase in Azure SQL Data Warehouse. You need to recommend a Stream Analytics data output
format to ensure that the queries from Databricks and PolyBase against the files encounter the
fewest possible errors. The solution must ensure that the files can be queried quickly and that the
data type information is retained. What should you recommend?
a) Avro
b) CSV
c) Parquet The Avro format is great for data and message preservation. Avro
d) JSON schema with its support for evolution is essential for making the data
robust for streaming architectures like Kafka, and with the metadata
that schema provides, you can reason on the data.
DP-203: Exam Q&A Series – Part 14
184
You have an Azure Storage account. You plan to copy one million image files to the storage
account. You plan to share the files with an external partner organization. The partner organization
will analyze the files during the next year. You need to recommend an external access solution for
the storage account. The solution must meet the following requirements:

- Ensure that only the partner organization can access the storage account.
- Ensure that access of the partner organization is removed automatically after 365 days.

What should you include in the recommendation?


a) shared keys
b) Azure Blob storage lifecycle management policies
c) Azure policies
d) shared access signature (SAS)
DP-203: Exam Q&A Series – Part 14
185
You work in ABC company and you as data engineer is given the responsibility to manage the jobs
in Azure. You decide to add a new job. While specifying the job constraints you set
maxWallClockTime property to 30 minutes. What is the impact of this?
a) The job can be in a ready state for a maximum of 30 minutes
b) The job can be in an inactive state for a maximum of 30 minutes
c) The job can be in the active or running state for a maximum of 30 minutes
d) The job will automatically start in 30 minutes
DP-203: Exam Q&A Series – Part 16
187
You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named
DB1. DB1 contains a fact table named Table1. You need to identify the extent of the data skew in
Table1. What should you do in Synapse Studio?
a) Connect to the built-in pool and run DBCC PDW_SHOWSPACEUSED.
b) Connect to the built-in pool and run DBCC CHECKALLOC.
c) Connect to Pool1 and query sys.dm_pdw_node_status.
d) Connect to Pool1 and query sys.dm_pdw_nodes_db_partition_stats.
DP-203: Exam Q&A Series – Part 16
188
You have an Azure data factory named ADF1. You currently publish all pipeline authoring changes
directly to ADF1. You need to implement version control for the changes made to pipeline artifacts.
The solution must ensure that you can apply version control to the resources currently defined in
the UX Authoring canvas for ADF1. Which two actions should you perform?
a) From the UX Authoring canvas, select Set up code repository.
b) Create a Git repository.
c) Create a GitHub action.
d) Create an Azure Data Factory trigger.
e) From the UX Authoring canvas, select Publish.
f) From the UX Authoring canvas, run Publish All.
DP-203: Exam Q&A Series – Part 16
189
You have an Azure Data Factory instance named ADF1 and two Azure Synapse Analytics
workspaces named WS1 and WS2. ADF1 contains the following pipelines:
• P1: Uses a copy activity to copy data from a nonpartitioned table in a dedicated SQL pool of
WS1 to an Azure Data Lake Storage Gen2 account
• P2: Uses a copy activity to copy data from text-delimited files in an Azure Data Lake Storage
Gen2 account to a nonpartitioned table in a dedicated SQL pool of WS2
You need to configure P1 and P2 to maximize parallelism and performance. Which dataset
settings should you configure for the copy activity for each pipeline?

P1 P2
Set the Copy method to Bulk insert Set the Copy method to Bulk insert
Set the Copy method to PolyBase Set the Copy method to PolyBase
Set the Isolation level to Repeatable read Set the Isolation level to Repeatable read
Set the Partition option to Dynamic range Set the Partition option to Dynamic range
DP-203: Exam Q&A Series – Part 16
190
You plan to monitor an Azure data factory by using the Monitor & Manage app. You need to identify
the status and duration of activities that reference a table in a source database. Which three
actions should you perform in sequence? To answer, move the actions from the list of actions to
the answer area and arrange them in the correct order.
a) From the Data Factory monitoring app, add the Source user property to the Activity Runs table2
b) From the Data Factory monitoring app, add the Source user property to the Pipeline runs table
c) From the Data Factory authoring UI, publish the pipelines 3
d) From the Data Factory monitoring app, add a linked service to the Pipeline Runs table
e) From the Data Factory authoring UI, generate a user property for Source on all activities 1
f) From the Data Factory authoring UI, generate a user property for Source on all datasets
DP-203: Exam Q&A Series – Part 16
191
Your company has two Microsoft Azure SQL databases named db1 and db2. You need to move
data from a table in db1 to a table in db2 by using a pipeline in Azure Data Factory. You create an
Azure Data Factory named ADF1. Which two types Of objects Should you create In ADF1 to
complete the pipeline? Each correct answer presents part of the solution. NOTE: Each correct
selection is worth one point.
a) a linked service
b) an Azure Service Bus
c) sources and targets
d) input and output I datasets
e) transformations
DP-203: Exam Q&A Series – Part 16
192
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
Table1 contains the following:
• One billion rows
• A clustered columnstore index
• A hash-distributed column named Product Key
• A column named Sales Date that is of the date data type and cannot be null
Thirty million rows will be added to Table1 each month. You need to partition Table1 based on the
Sales Date column. The solution must optimize query performance and data loading. How often
should you create a partition?

a) once per month


b) once per year
c) once per day
d) once per week
DP-203: Exam Q&A Series – Part 16
193
You have an Azure subscription that contains an Azure Data Lake Storage account named
myaccount1. The myaccount1 account contains two containers named container1 and container 2.
The subscription is linked to an Azure Active Directory (Azure AD) tenant that contains a security
group named Group1. You need to grant Group1 read access to container1. The solution must use
the principle of least privilege. Which role should you assign to Group1?
a) Storage Blob Data Reader for container1
b) Storage Table Data Reader for container1
c) Storage Blob Data Reader for myaccount1
d) Storage Table Data Reader for myaccount1
DP-203: Exam Q&A Series – Part 16
194
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. You
have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1. You plan to insert data from the files in container1 into Table1 and transform the data.
Each row of data in the files will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is
stored as an additional column in Table1.
Solution: You use a dedicated SQL pool to create an external table that has an additional DateTime
column.
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 16
195
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. You
have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1. You plan to insert data from the files in container1 into Table1 and transform the data.
Each row of data in the files will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is
stored as an additional column in Table1.
Solution: You use an Azure Synapse Analytics serverless SQL pool to create an external table that
has an additional DateTime column.
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 16
196
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. You
have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1. You plan to insert data from the files in container1 into Table1 and transform the data.
Each row of data in the files will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is
stored as an additional column in Table1.
Solution: In an Azure Synapse Analytics pipeline, you use a data flow that contains a Derived
Column transformation.
Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 17
197
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two
outputs. Does this meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 17
198
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one query, and two outputs. Does this
meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 17
199
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input,
two queries, and four outputs. Does this meet the goal?
Yes No

• We need one reference data input for LocationIncomes, which rarely changes.
• We need two queries, one for in-store customers, and one for online customers.
• For each query two outputs is needed. That makes a total of four outputs.
DP-203: Exam Q&A Series – Part 17
200
An on-premises data warehouse has the following fact tables, both having the columns: DateKey,
ProductKey, RegionKey. There are 120 unique product keys and 65 unique region keys.
Table Comments Queries that use the data warehouse take a
Sales The table is 600 GB in size. DateKey is used extensively long time to complete. You plan to migrate
in the WHERE clause in queries. ProductKey is used
extensively in join operations. RegionKey is used for
the solution to use Azure Synapse
grouping. Severity-five percent of records relate to Analytics. You need to ensure that the
one of 40 regions. Azure-based solution optimizes query
Invoice The table is 6 GB in size. DateKey and ProductKey are performance and minimizes processing
used extensively in the WHERE clause in queries.
RegionKey is used for grouping.
skew. What should you recommend?

Sales Table Invoice Table


Distribution Type Distribution Column Distribution Type Distribution Column
Hash-distributed DateKey Hash-distributed DateKey
Round robin ProductKey Round robin ProductKey
RegionKey RegionKey
DP-203: Exam Q&A Series – Part 17
201
Applications that publish messages to Azure Event Hub very frequently will get the best
performance using Advanced Message Queuing Protocol (AMQP) because it establishes a
persistent socket.
True False

202
If an Event Hub goes offline before a consumer group can process the events it holds, those
events will be lost
True False
DP-203: Exam Q&A Series – Part 17
203
By default, how many partitions will a new Event Hub have?

1 2 3 4 8

204
What is the maximum number of activities per pipeline in Azure Data Factory?

40 60 80 100 150
DP-203: Exam Q&A Series – Part 17
205
You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the
data to an Azure Blob storage account. You need to output the count of tweets during the last five
minutes every five minutes. Which windowing function should you use?
a) a five-minute Sliding window
b) a five-minute Session window
c) a five-minute Tumbling window
d) has a one-minute hop
DP-203: Exam Q&A Series – Part 17
206
You are creating a new notebook in Azure Databricks that will support R as the primary language
but will also support Scala and SQL. Which switch should you use to switch between languages?
a) %
b) @
c) []
d) ()

• %python
• %R
• %Scala
• % sql
DP-203: Exam Q&A Series – Part 18
207
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data
Warehouse. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL
Data Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure storage account
3. Create an external file format and external table using the external data source
4. Load the data using the INSERT…SELECT statement

Does the solution meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 18
208
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data
Warehouse. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL
Data Warehouse.
Solution:
1. Create an external data source pointing to the Azure storage account
2. Create an external file format and external table using the external data source
3. Load the data using the INSERT…SELECT statement

Does the solution meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 18
209
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data
Warehouse. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL
Data Warehouse.
Solution:
1. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
2. Create an external file format and external table using the external data source
3. Load the data using the CREATE TABLE AS SELECT statement

Does the solution meet the goal?

Yes No
DP-203: Exam Q&A Series – Part 18
210
You are moving data from an Azure Data Lake Gen2 storage to Azure Synapse Analytics. Which
Azure Data Factory integration runtime would you use in a data copy activity?
a) Azure - SSIS
b) Azure IR
c) Self-hosted
d) Pipelines
DP-203: Exam Q&A Series – Part 18
211
You have an enterprise data warehouse in
Azure Synapse Analytics that contains a
table named FactOnlineSales. The table
contains data from the start of 2009 to the
end of 2012. You need to improve the
performance of queries against
FactOnlineSales by using table partitions.
The solution must meet the following
requirements:
- Create four partitions based on the order
date.
- Ensure that each partition contains all the
orders placed during a given calendar year.
How should you complete the T-SQL
command?
DP-203: Exam Q&A Series – Part 18
212
You are performing exploratory
analysis of bus fare data in an
Azure Data Lake Storage Gen2
account by using an Azure
Synapse Analytics serverless
SQL pool. You execute the
Transact-SQL query shown in the
following exhibit.
DP-203: Exam Q&A Series – Part 18
212
Use the drop-down menus to select the answer choice that completes each statement based on
the information presented in the graphic.
DP-203: Exam Q&A Series – Part 18
213
You have an Azure subscription that is linked to a hybrid Azure Active Directory (Azure AD) tenant.
The subscription contains an Azure Synapse Analytics SQL pool named Pool1. You need to
recommend an authentication solution for Pool1. The solution must support multi-factor
authentication (MFA) and database-level authentication. Which authentication solution or
solutions should you include in the recommendation? To answer, select the appropriate options in
the answer area.
MFA Database level authentication
Azure AD authentication Application roles
Microsoft SQL Server authentication Contained database users
Password-less authentication Database roles
Windows authentication Microsoft SQL Server Logins
DP-203: Exam Q&A Series – Part 18
214
You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool.
The table will have a clustered columnstore index and will include the following columns:
Table Comment
EventDate One million records are added to the table each day
EventTypeID The table contains 10 million records for each event type
WarehouseID The table contains 100 million records for each warehouse
ProductCategoryTypeID The table contains 25 million records for each product category type

You identify the following usage patterns: a) EventTypeID


• Analysts will most commonly analyze transactions for a b) ProductCategoryTypeID
warehouse. c) EventDate
• Queries will summarize by product category type, date,
d) WarehouseID
and/or inventory event type.
You need to recommend a partition strategy for the table to
minimize query times.
On which column should you partition the table?
DP-203: Exam Q&A Series – Part 18
215
You configure monitoring for an Azure Synapse Analytics implementation. The implementation
uses PolyBase to load data from comma-separated value (CSV) files stored in Azure Data Lake
Storage Gen2 using an external table. Files with an invalid schema cause errors to occur. You need
to monitor for an invalid schema error. For which error should you monitor?
a) EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error
[com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.'
b) Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted-
the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows
processed.
c) EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable
to instantiate LoginClass] occurred while accessing external file.'
d) EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [No
FileSystem for scheme: wasbs] occurred while accessing external file.'
DP-203: Exam Q&A Series – Part 18
216
You are working on Azure Data Lake Store Gen1. Suddenly, you realize the need to know the
schema of the external data. Which of the following plug-in would you use to know the external
data schema?
a) Ipv4_lookup
b) Mysql_request
c) Pivot
d) Narrow
e) infer_storage_schema

• infer_storage_schema is the plug-in that helps


infer the schema based on the external file
contents; when the external data schema is
unknown.

You might also like