You are on page 1of 68

Topic 1 question 1 discussion

nkav
:: Highly Voted 3 months, 3 weeks ago
product key is a surrogate key as it is an identity column
upvoted 24 times
111222333
:: 3 months, 1 week ago
Agree on the surrogate key, exactly.
"In data warehousing, IDENTITY functionality is particularly important as it
makes easier the creation of surrogate keys."
Why ProductKey is certainly not a business key: "The IDENTITY
value in Synapse is not guaranteed to be unique if the user explicitly inserts a duplicate value with 'SET
IDENTITY_INSERT ON' or reseeds IDENTITY". Business key is an index which identifies uniqueness of a row
and here Microsoft says that identity doesn't guarantee uniqueness.
References:
https://azure.microsoft.com/en-
us/blog/identity-now-available-with-azure-sql-data-warehouse/
https://docs.microsoft.com/en-us/azure/synapse-
analytics/sql-data-warehouse/sql-data-warehouse-tables-identity
upvoted 3 times
...
...
sagga
:: Highly Voted 3 months, 2 weeks ago
Type2 because there are start and end columns and ProductKey is a surrogate key. ProductNumber seems a business key.
upvoted 14 times
DrC
:: 2 months, 4 weeks ago
The start and end columns are for when to when the product was being sold, not for metadata purposes. That
makes it: Type 1 – No History
Update record directly, there is no record of historical values, only current state
upvoted 13 times
captainbee
:: 2 months, 3 weeks ago
Exactly how I saw it
upvoted 1 times
...
...
...
SatyamKishore
:: Most Recent 22 hours, 40 minutes ago
this is a divided discussion, still confused if is SCD 1or 2 ?
upvoted 1 times
...
YipingRuan
:: 3 days, 3 hours ago
Type 2 and Surrogate key.
“The table must also define a surrogate key because the business key (in this instance,
employee ID) won't be unique.”
https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-
azure-synapse-analytics-pipelines/3-choose-between-dimension-types
So IDENTITY suggests [ProductKey] is a
surrogate key.
upvoted 1 times
...
anarvekar
:: 2 weeks, 2 days ago
I guess the answer Type-2 is valid because RowInsertedDateTime and RowUpdatedDateTime are being used as type-2
effective dates, where inserted date is the effective_from date and updated date is the effective_to date, which will be set
to some futuristic date or NULL for the currently active records. So I'm conviced that it is Type-2.
However, ProductKey
has to be a surrogate key. Identity column can never be a business/natural key, as that's what we import from the source
as is and the column is supposed to contain duplicates in case of type-2.
upvoted 1 times
...
Akki0120
:: 1 month ago
For all questions from contributor access 9403778084
upvoted 1 times
...
noone_a
:: 1 month, 2 weeks ago
SCD Type 1 is correct. There is no start/end date to show when the record is valid from/to. sellStart/end does not fulfill
this role. a product might have a limited sales run, say of 1 month, and that is what these columns show. they dont show
the row has been replaced.
The key is a surrogate key. Identity fields generate unique values in most cases. of course this
can be overridden using IDENTITY_INSERT, but this is something that is only used usually to fix issues, and not in day
to day operations.
upvoted 3 times
...
Balaji1003
:: 1 month, 2 weeks ago
Type1 and SurrogateKey.
Type1 because the sellstartdate and sellenddate has business meaning, and not SCD columns.
Surrogatekey because ID is incremented for every insert.
upvoted 1 times
...
Steviyke
:: 2 months ago
Answer is: TYPE 2 SCD and Surrogate Key. There is a [ETLAuditID] that's an INT and tracks changes like 1 or 0 for
history. Also, you cannot have a TYPE1 SCD with a surrogate key.
upvoted 2 times
...
eng1
:: 2 months, 2 weeks ago
Type 2 doesn't need the insert and update field, so it's Type 1 and surrogate key
upvoted 6 times
...
ThiruthuvaRajan
:: 2 months, 2 weeks ago
SCD is Type-2. It has both start and end information. with that we can easily say which is current one. the "Current" one
refers to Type-2.
https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-
analytics-pipelines/3-choose-between-dimension-types
And the key is unique identifier for each row so it is Surrogate
key.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity
upvoted 1 times
captainbee
:: 2 months, 1 week ago
It really isn't Type-2. The start and end columns apply to the product being sold, not the entry on the table. Also
there is no IsActive column either. Type-1 all the way.
upvoted 3 times
...
...
DragonBlake
:: 2 months, 3 weeks ago
product is surrogate key
upvoted 2 times
...
clguy
:: 2 months, 3 weeks ago
ProductKey is SK and SourceProductId is BK and it it TYPE 1 SCD
upvoted 2 times
...
dmnantilla9
:: 3 months ago
Is type 2 and surrogate key
upvoted 1 times
...
wfrf92
:: 3 months, 2 weeks ago
Type 1
Surrogate Key
upvoted 11 times
baobabko
:: 2 months, 4 weeks ago
Type 1 as there is no obvious versioning, just latest value and the time of record creation and update.
upvoted 3 times
...
...
bananawu
:: 3 months, 2 weeks ago
Correct Answer, "In Azure Synapse Analytics, the IDENTITY value increases on its own in each distribution and does
not overlap with IDENTITY values in other distributions. The IDENTITY value in Synapse is not guaranteed to be
unique if the user explicitly inserts a duplicate value with “SET IDENTITY_INSERT ON” or reseeds IDENTITY. For
details, see CREATE TABLE (Transact-SQL) IDENTITY (Property)."
upvoted 1 times
baobabko
:: 2 months, 4 weeks ago
IDENTITY is assigned by the system. It has no business meaning. Hence it cannot be a business key.
Automatically generated and assigned keys are called Surrogate Keys.
upvoted 1 times
...
...
neerajkrjain
:: 3 months, 2 weeks ago
It should be a type 1 dimension.
upvoted 3 times
...
malakosan
:: 3 months, 3 weeks ago
I agree
upvoted 1 times
malakosan
:: 3 months, 3 weeks ago
With Arindamb
upvoted 1 times
...
...
Arindamb
:: 3 months, 3 weeks ago
Identity column holds natural number which is different from natural key such as SSN Number, Mobile number etc.
Hence the answer should be surrogate key..
upvoted 4 times
malakosan
:: 3 months, 3 weeks ago
I Agree
upvoted 2 times
...
...
Topic 1 question 2 discussion
AugustineUba
:: Highly Voted 2 weeks, 2 days ago
From the documentation the answer is clear enough. B is the right answer. When choosing a distribution column, select a
distribution column that: "Is not a date column. All data for the same date lands in the same distribution. If several users
are all filtering on the same date, then only 1 of the 60 distributions do all the processing work."
upvoted 6 times
...
waterbender19
:: Most Recent 2 weeks, 3 days ago
I think the answer should be D for that specific query. If you look at the datatypes, DateKey is an INT datatype not a
DATE datatype.
upvoted 2 times
waterbender19
:: 2 weeks, 2 days ago
and thet statement that Fact table will be added 1 million rows daily means that each datekey value has an equal
amount of rows associated with that value.
upvoted 1 times
...
...
andimohr
:: 3 weeks, 6 days ago
The reference given in the answer is precise: Choose a distribution column with data that a) distributes evenly b) has
many unique values c) does not have NULLs or few NULLs and d) IS NOT A DATE COLUMN... definitely the best
choice for the Hash distribution is on the Identity column.
upvoted 4 times
...
noone_a
:: 1 month, 2 weeks ago
although its a fact table, replicated is the correct distribution in this case.
Each row is 141 bytes in size x 1000000
records = 135Mb total size
Microsoft recommend replicated distribution for anything under 2GB.
We have no further
information regarding table growth so this answer is based only on the info provided.
upvoted 1 times
noone_a
:: 1 month, 2 weeks ago
edit, this is incorrect as it will have 1 million records added daily for 3 years, putting it over 2GB
upvoted 2 times
...
...
vlad888
:: 1 month, 3 weeks ago
Yes - do not use date column - there is such recomendation in synapse docs. But here we have range search - potensiallu
several nodes will be used.
upvoted 1 times
...
vlad888
:: 1 month, 3 weeks ago
Actually it is clear that it should be hash distributed. BUT Product key brings no benefit for this query - doesn't
participated in it at all. So - DateKey. Although it is unusual for Synapse
upvoted 3 times
...
savin
:: 2 months ago
I don't think there is enough information to decide this. Also we can not decide it by just looking at one query. Only
considering this query and if we assume no other dimensions are connected to this fact table, good answer would be D.
upvoted 2 times
...
ChandrashekharDeshpande
:: 2 months, 2 weeks ago
My answer goes with D...
In most cases data is partitioned on a date column that is closely tied to the order in which the
data is loaded into the SQL pool. Partitioning improves query performance. A query that applies a filter to partitioned
data can limit the scan to only the qualifying partitions thereby improving performance dramatically as filtering can
avoid a full table scan and only scan a smaller subset of data. It also seems, the data partitioned on date will get
distributed uniformly across the nodes thereby avoiding a partition to be hot partition.
upvoted 1 times
vlad888
:: 1 month, 3 weeks ago
Avoiding partition - compute node to be precise - is least desirable thing - it is mpp system. 60 nodes performs
work faster then 5.
upvoted 1 times
...
...
bc5468521
:: 2 months, 4 weeks ago
Agree to B
upvoted 3 times
...
Ritab
:: 3 months, 2 weeks ago
Round robin looks to be the best fit
upvoted 1 times
baobabko
:: 2 months, 4 weeks ago
The question is about this exact query. To minimize the time for this query you should distribute the work. But -
if we do hash distribution on date column this will utilize at most 30 distributions. Round robin would be a good
choice if this is really the only query we run, but we probably want to join with other tables on the primary key.
So hash distribution on the primary key might be better choice. If we assume uniform primary key distribution,
hashing on the PK will have the effect of round robin. - hence B is the correct answer.
upvoted 7 times
DrC
:: 2 months, 4 weeks ago
Also: 1 million rows of data added daily and will contain three years of data.
It will have over a billion
rows when loaded.
That will put it over the 2GB recommendation for hash-distributed.
Consider using a
hash-distributed table when:
* The table size on disk is more than 2 GB.
* The table has frequent insert,
update, and delete operations.
upvoted 1 times
lsdudi
:: 1 month, 1 week ago
Only round robin will use all 60 partitions. There is no join Key.
upvoted 1 times
...
...
...
...
Pradip_valens
:: 3 months, 2 weeks ago
"Not D: Do not use a date column. . All data for the same date lands in the same distribution. If several users are all
filtering on the same date, then only 1 of the 60 distributions do all the processing work." ???
the same implies for
ProductKey, now forgiven query we may need to check every record for the date, so checking all 60 distribution ???
upvoted 2 times
freerider
:: 3 months, 2 weeks ago
According to the reference there are multiple things that makes it inappropiate to use the date column:
Is not used
in WHERE clauses. This could narrow the query to not run on all the distributions.
Is not a date column.
WHERE clauses often filter by date. When this happens, all the processing could run on only a few distributions.
Replicated is unlikely to be correct since it's to much data (a million rows per day for the last 3 years).
They also
use the product key in the reference example.
upvoted 3 times
...
baobabko
:: 2 months, 4 weeks ago
The question is about this exact query. To minimize the time for this query you should distribute the work. But -
if we do hash distribution on date column this will utilize at most 30 distributions. Round robin would be a good
choice if this is really the only query we run, but we probably want to join with other tables on the primary key.
So hash distribution on the primary key might be better choice. If we assume uniform primary key distribution,
hashing on the PK will have the effect of round robin.
upvoted 1 times
...
...
Topic 1 question 3 discussion
uther
:: Highly Voted 3 months, 3 weeks ago
it should be ManagerEmployeeKey, in dimensions we use surogates to create hierarchy, co answer IMO is C
upvoted 26 times
baobabko
:: 2 months, 4 weeks ago
Agree. The purpose of surrogate key is to encapsulate business key which might change unexpectedly or can
have duplicates if data comes from different systems. Business key is preserved only for lineage/traceability
purpose. Business key should not be used for linking inside data warehouse. In addition - as the table is defined,
it is not unique key.
upvoted 4 times
...
malakosan
:: 3 months, 2 weeks ago
I agree, is C
upvoted 5 times
...
...
TorbenS
:: Highly Voted 3 months, 1 week ago
I think the correct answer is [ManagerEmployeeID] (A) because at the time of the insert we can’t guarantee that the
manager is already inserted and thus we can’t resolve the EmployeeKey of the manager, because it is an identity.
upvoted 7 times
DragonBlake
:: 2 months, 3 weeks ago
If you use ManagerEmployeeID, it is not unique. Correct answer is C
upvoted 3 times
...
...
YipingRuan
:: Most Recent 3 days, 2 hours ago
"Provide fast lookup of the manager" and surrogate key [ManagerEmployeeKey] is unique.
upvoted 1 times
...
angelato
:: 2 weeks ago
Explanation from Udemy: [ManagerEmployeeKey] [int] NULL is the correct line to add to the table. In dimensions we
use surrogates. If [ManagerEmployeeID] [int] NULL is used to create a hierarchy, at the time of the insert we can’t
guarantee that the manager is already inserted and thus we can’t resolve the EmployeeKey of the manager, because it is
an identity.
Hierarchies, in tabular models, are metadata that define relationships between two or more columns in a
table. Hierarchies can appear separate from other columns in a reporting client field list, making them easier for client
users to navigate and include in a report.
upvoted 1 times
...
andimohr
:: 3 weeks, 6 days ago
Correct answer is A. [ManagerEmployeeID] [int] NULL
Follow the given reference: "Hierarchies are... meant to be...
used as a tool for providing a better user experience."
We are data engineers. The key is that we should create a new
column to "support creating an employee reporting hierarchy for your entire company". The entire company (data
analysts, report consumers) will not be aware of the technically created surrogate "EmployeeKey". Naming the column
with a reference to EmployeeId - and using the business value EmployeeId for this reference - will give most individuals
in the company the best experience buliding data models, looking at sample data etc.
My impression is most discussions
here have possible performance issues in mind. Both EmployeeId and EmployeeKey are integers and will perform
similar if the .
upvoted 2 times
...
Akki0120
:: 1 month ago
For all questions from contributor access 9403778084
upvoted 1 times
...
EddyRoboto
:: 1 month, 2 weeks ago
What if we had an update in manager table?
The surrogate key would be incremented and we would lose the current
manage information (if the manage table be an SCD type2).
So, I think that the correct answer is A;
upvoted 5 times
EddyRoboto
:: 17 hours, 40 minutes ago
Pls, desconsider, I misuderstood the question. The correct answer is C, like stated above.
upvoted 1 times
...
...
meswapnilspal
:: 2 months ago
what's the diff between ManagerEmployeeKey and ManagerEmployeeID ? I am new to Data warehousing concepts
upvoted 2 times
...
Steviyke
:: 2 months ago
If you use [ManagerEmployeeKey] [int] NULL, how are you going to implement hierarchy in your design? That is why
A is the only logical option.
upvoted 2 times
...
bc5468521
:: 2 months, 4 weeks ago
Agree to C
upvoted 1 times
...
Topic 1 question 4 discussion
kruukp
:: Highly Voted 3 months, 2 weeks ago
B is a correct answer. There is a column 'name' in the where clause which doesn't exist in the table.
upvoted 47 times
knarf
:: 2 months ago
I agree B is correct, not because the column 'name' in the query is invalid, but because the table reference itself is
invalid as the table was created as CREATE TABLE mytestdb.myParquetTable and not
mytestdb.dbo.myParquetTable
upvoted 3 times
anarvekar
:: 2 weeks, 2 days ago
Isn't dbo the default schema the objects are created in, if the schema name is not explicitly specified in
the DDL?
upvoted 1 times
...
AugustineUba
:: 4 weeks, 1 day ago
I agree with this.
upvoted 1 times
...
...
baobabko
:: 2 months, 4 weeks ago
Even if the column name is correct. When I tried the example , it threw an error that table doesn't exist (as
expected - after all - it is a Spark table, not SQL. There is no external or any other table which could be queried
in the SQL pool)
upvoted 2 times
knarf
:: 2 months ago
See my post above and comment?
upvoted 1 times
...
Alekx42
:: 2 months, 3 weeks ago
https://docs.microsoft.com/en-us/azure/synapse-analytics/metadata/table
"Once a database has been
created by a Spark job, you can create tables in it with Spark that use Parquet as the storage format. Table
names will be converted to lower case and need to be queried using the lower case name. These tables
will immediately become available for querying by any of the Azure Synapse workspace Spark pools.
The Spark created, managed, and external tables are also made available as external tables with the same
name in the corresponding synchronized database in serverless SQL pool."
I think the reason you got the
error was because the query had to use the lower case names. See the example in the same link, they
create a similar table and use the lowercase letters to query it from the Serverless SQL pool.
Anyway, this
confirms that B is the correct answer here.
upvoted 2 times
...
...
...
ast3roid
:: Most Recent 2 weeks, 2 days ago
The question is wrong. Looks like it was created reffering to this example.
https://docs.microsoft.com/en-
us/azure/synapse-analytics/metadata/table#examples
Table create query is updated according to the questiong but select
query looks the same. Anser is B with `name` in the where clause and Anser is A with `EmployeeId` in the where clause.
upvoted 1 times
...
knarf
:: 2 months ago
I vote for B - The table was inadvertently created with the schema 'mytestdb' and not the indended 'dbo' schema. The
query refers to the three-part name mytestdb.dbo.myParquetTable which is invalid.
upvoted 2 times
...
Steviyke
:: 2 months ago
The query will throw an ERROR as name != EmployeeName. There is no column as "Name or name" in the Spark pool
table.
If the table was queried with "employeename" it will return the right answer.
upvoted 1 times
...
savin
:: 2 months ago
Ans is B since the column name is not "name"
upvoted 1 times
...
terajuana
:: 2 months, 2 weeks ago
from the documentation
"Azure Synapse Analytics allows the different workspace computational engines to share
databases and Parquet-backed tables between its Apache Spark pools and serverless SQL pool."
upvoted 1 times
...
dmnantilla9
:: 3 months ago
the response is A: only if the column name is "EmployeeName", but not only "name".
upvoted 2 times
AndrewThePandrew
:: 2 months, 3 weeks ago
agree. This is what through me off.
upvoted 1 times
...
...
Topic 1 question 5 discussion
AvithK
:: 2 weeks ago
truncate partition is even quicker, why isn't that the answer, if the data is dropped anyway?
upvoted 1 times
BlackMal
:: 1 week, 3 days ago
This, i think it should be the answer
upvoted 1 times
...
...
poornipv
:: 3 weeks, 5 days ago
what is the correct answer for this?
upvoted 2 times
...
AnonAzureDataEngineer
:: 4 weeks, 1 day ago
Seems like it should be:
1. E
2. A
3. C
upvoted 1 times
...
dragos_dragos62000
:: 1 month, 3 weeks ago
Correct!
upvoted 1 times
...
Dileepvikram
:: 2 months, 3 weeks ago
The data copy to back up table is not mentioned in the answer
upvoted 1 times
savin
:: 2 months ago
partition switching part covers it. So its correct i think
upvoted 1 times
...
...
wfrf92
:: 3 months, 2 weeks ago
Is this correct ????
upvoted 1 times
alain2
:: 3 months, 1 week ago
Yes, it is.
https://www.cathrinewilhelmsen.net/table-partitioning-in-sql-server-partition-switching/
upvoted 3 times
...
TorbenS
:: 3 months, 1 week ago
yes, I think so
upvoted 4 times
...
...
Topic 1 question 6 discussion
Chillem1900
:: Highly Voted 3 months, 3 weeks ago
I believe the answer should be B. In case of a serverless pool a wildcard should be added to the location.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables?tabs=hadoop#arguments-
create-external-table
upvoted 31 times
...
alain2
:: Highly Voted 3 months, 1 week ago
"Serverless SQL pool can recursively traverse folders only if you specify /** at the end of path."
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-folders-multiple-csv-files
upvoted 9 times
Preben
:: 2 months, 2 weeks ago
When you are quoting from Microsoft documentation, do not ADD in words to the sentence. 'Only' is not used.
upvoted 5 times
...
...
Akki0120
:: Most Recent 1 month ago
For all questions from contributor access 9403778084
upvoted 2 times
...
elimey
:: 1 month ago
The answer is B
upvoted 2 times
...
AKC11
:: 1 month, 1 week ago
Answer is B. C can be the answer only if there are wildcards in the path https://docs.microsoft.com/en-us/azure/synapse-
analytics/sql/query-folders-multiple-csv-files
upvoted 1 times
...
InvisibleShadow
:: 1 month, 3 weeks ago
Answer should be B. Please fix in the exam question.
upvoted 2 times
...
bc5468521
:: 2 months, 4 weeks ago
Go for B
upvoted 4 times
...
wfrf92
:: 3 months, 2 weeks ago
Unlike Hadoop external tables, native external tables don't return subfolders unless you specify /** at the end of path. In
this example, if LOCATION='/webdata/', a serverless SQL pool query, will return rows from mydata.txt. It won't return
mydata2.txt and mydata3.txt because they're located in a subfolder. Hadoop tables will return all files within any sub-
folder.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables?tabs=hadoop
upvoted 4 times
...
Topic 1 question 7 discussion
alain2
:: Highly Voted 3 months, 1 week ago
1: Parquet - column-oriented binary file format
2: AVRO - Row based format, and has logical type timestamp
https://youtu.be/UrWthx8T3UY
upvoted 27 times
terajuana
:: 2 months, 2 weeks ago
the web is full of old information. timestamp support has been added to parquet
upvoted 3 times
vlad888
:: 1 month, 3 weeks ago
Ok, but in 1st case we need only 3 of 50 columns. Parquet i columnar format. In 2nd Avro because ideal
for read full row
upvoted 4 times
...
...
...
Himlo24
:: Highly Voted 3 months, 2 weeks ago
Shouldn't the answer for Report 1 be Parquet? Because Parquet format is Columnar and should be best for reading a few
columns only.
upvoted 7 times
...
elimey
:: Most Recent 1 month ago
https://luminousmen.com/post/big-data-file-formats
upvoted 1 times
...
elimey
:: 1 month ago
Report 1 definitely Parquet
upvoted 1 times
...
noone_a
:: 1 month, 2 weeks ago
report 1 - Parquet as it is columar.
report 2 - avro as it is row based and can be compressed further than csv.
upvoted 1 times
...
bsa_2021
:: 2 months ago
The actual answer provided and answer from discussion differs. Which one to follow for actual exam?
upvoted 1 times
...
bc5468521
:: 2 months, 4 weeks ago
1- Parquet
2- Parquet
Since they are all querying; AVRO is good for writing, OLTP, Parquet is good for quering/read
upvoted 4 times
...
szpinat
:: 3 months, 1 week ago
For Report 2 - why not csv?
upvoted 1 times
...
ehnw
:: 3 months, 2 weeks ago
there is no mention fo aviro in the learning materials provided by Microsoft. not sure about it
upvoted 1 times
...
Topic 1 question 8 discussion
sagga
:: Highly Voted 3 months, 2 weeks ago
D is correct
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#batch-jobs-structure
upvoted 17 times
...
Sunnyb
:: Most Recent 2 months, 2 weeks ago
D is absolutely correct
upvoted 2 times
...
Topic 1 question 9 discussion
elimey
:: 1 month ago
correct
upvoted 2 times
...
Krishna_Kumar__
:: 2 months ago
The Answer seems correct 1: Parquet
2: AVRO
upvoted 2 times
...
Topic 1 question 10 discussion
alain2
:: Highly Voted 3 months, 1 week ago
1. Merge Files
2. Parquet
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-performance-tuning-
guidance
upvoted 29 times
Ameenymous
:: 3 months ago
The smaller the files, the negative the performance so Merge and Parquet seems to be the right answer.
upvoted 7 times
...
...
captainbee
:: Highly Voted 1 month, 3 weeks ago
It's frustrating just how many questions ExamTopics get wrong. Can't be helpful
upvoted 11 times
RyuHayabusa
:: 1 month ago
At least it helps in learning, as you have to research and think for yourself. Another big topic is having this
questions in the first place is immensely helpful
upvoted 5 times
...
...
elimey
:: Most Recent 1 month ago
1. Merge Files: Because the question said 10 different small JSON to a different file
2. Parquet
upvoted 3 times
...
Erte
:: 1 month, 3 weeks ago
Box 1: Preserver herarchy
Compared to the flat namespace on Blob storage, the hierarchical namespace greatly improves
the performance of directory management operations, which
improves overall job performance.
Box 2: Parquet
Azure
Data Factory parquet format is supported for Azure Data Lake Storage Gen2. Parquet supports the schema property.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
https://docs.microsoft.com/en-us/azure/data-factory/format-parquet
upvoted 1 times
...
ThiruthuvaRajan
:: 2 months, 3 weeks ago
It should be 1)Merge Files - Question clearly says "initially ingested as 10 small json files". There is no hint on hierarchy
or partition information. so clearly we need to merge these files for better performance
2) Parquet -> Always gives better
performance for columnar based data
upvoted 5 times
...
Topic 1 question 12 discussion
yobllip
:: Highly Voted 2 months, 3 weeks ago
Answer should be
1 - Cool
2 - Archive
Comparison table shown access time for cool tier ttfb is milliseconds
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers#comparing-block-blob-storage-options
upvoted 16 times
...
ssitb
:: Most Recent 2 months, 3 weeks ago
Answer should be 1-hot
2-archive
https://www.bmc.com/blogs/cold-vs-hot-data-storage/
Cold storage data retrieval can
take much longer than hot storage. It can take minutes to hours to access cold storage data
upvoted 2 times
captainbee
:: 2 months, 2 weeks ago
Cold storage takes milliseconds to retrieve
upvoted 3 times
...
syamkumar
:: 2 months, 2 weeks ago
I also doubt if its hot storage and archive.. because its mentioned 5-year-old has to be retrieved within seconds
which is not possible via cold storage//
upvoted 1 times
savin
:: 2 months ago
but the cost factor is also there. keeping the data in hot tier for 5 years vs cold tier for 5 years would add
significant amount.
upvoted 1 times
...
...
...
DrC
:: 2 months, 4 weeks ago
Answer is correct
upvoted 4 times
...
Topic 1 question 13 discussion
Sunnyb
:: Highly Voted 2 months, 3 weeks ago
Answer is correct
upvoted 12 times
...
Topic 1 question 14 discussion
bc5468521
:: Highly Voted 2 months, 4 weeks ago
Answer D; Temporal table is better than SCD2, but it is not supported in Synpase yet
upvoted 8 times
Preben
:: 2 months, 2 weeks ago
Here's the documentation for how to implement temporal tables in Synapse from 2019.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-
temporary
upvoted 1 times
mbravo
:: 2 months, 2 weeks ago
Temporal tables and Temporary tables are two very distinct concepts. Your link has absolutely nothing to
do with this question.
upvoted 5 times
Vaishnav
:: 1 month, 2 weeks ago
https://docs.microsoft.com/en-us/azure/azure-sql/temporal-tables
Answer : A Temporal Tables
upvoted 1 times
Vaishnav
:: 1 month, 2 weeks ago
Sorry Answer is D: SCD 2 , as according to microsoft docs ,"Temporal tables keep data
closely related to time context so that stored facts can be interpreted as valid only within
the specific period." , as in the question it is mentioned "from a given point in time", so D
seems to be the correct.
upvoted 1 times
...
...
...
...
...
dd1122
:: Most Recent 1 week, 5 days ago
Answer D is correct. Temporal tables mentioned in the link below are supported in Azure SQL Database(PaaS) and
Azure Managed Instance, where as in this question Dedicated SQL Pools are mentioned so no temporal tables can be
used. SCD Type 2 is the answer.
https://docs.microsoft.com/en-us/azure/azure-sql/temporal-tables
upvoted 2 times
...
escoins
:: 1 month, 4 weeks ago
Definitively answer D
upvoted 1 times
...
[Removed]
:: 2 months, 1 week ago
The answer is A - Temporal tables
"Temporal tables enable you to restore row versions from any point in time."
https://docs.microsoft.com/en-us/azure/azure-sql/database/business-continuity-high-availability-disaster-recover-hadr-
overview
upvoted 1 times
...
Dileepvikram
:: 2 months, 3 weeks ago
The requirement says that the table should store latest information, so the answer should be temporal table, right?
Because scd type 2 will store the complete history.
upvoted 1 times
captainbee
:: 2 months, 2 weeks ago
Also needs to return employee information from a given point in time? Full history needed for that.
upvoted 6 times
...
...
Topic 1 question 15 discussion
Diane
:: Highly Voted 3 months, 2 weeks ago
correct answer is ABF https://www.examtopics.com/discussions/microsoft/view/41207-exam-dp-200-topic-1-question-
56-discussion/
upvoted 22 times
AvithK
:: 1 week, 6 days ago
yes but the order is different it is FAB
upvoted 1 times
KingIlo
:: 1 week, 2 days ago
The question didn't specify order or sequence
upvoted 1 times
...
...
...
AvithK
:: Most Recent 2 weeks ago
I don't get why it doesn't start with F. The managed identity should be created first, right?
upvoted 2 times
...
IDKol
:: 1 month ago
Correct Answer should be
F. Create a managed identity.
A. Add the managed identity to the Sales group.
B. Use the
managed identity as the credentials for the data load process.
upvoted 3 times
...
MonemSnow
:: 1 month, 2 weeks ago
A, C, F is the correct answer
upvoted 1 times
...
savin
:: 2 months ago
We need to configure so synapse is able to access the data lake so we need to create managed identity and add it to sales
group since it already can access the data lake. Adding our AD creds to sales group allows us to access the storage using
that credentials but will not be able to load the data to synapse
upvoted 1 times
...
Krishna_Kumar__
:: 2 months ago
Correct Answer should be A. Add the managed identity to the Sales group.
B. Use the managed identity as the
credentials for the data load process.
F. Create a managed identity.
upvoted 2 times
...
jikilim858
:: 2 months, 1 week ago
ADF = Azure Data Factory
upvoted 4 times
...
savin
:: 2 months, 2 weeks ago
ABF should be correct
upvoted 3 times
...
AndrewThePandrew
:: 2 months, 3 weeks ago
Answer should be F: create managed ID, A: Add Managed ID to the group, D: use the managed ID for the load process
via Azure active directory. How can you add a managed identity to something if it is not created first? Maybe others are
seeing this in a different order?
upvoted 4 times
...
wfrf92
:: 3 months, 2 weeks ago
it should be A,B,F
upvoted 4 times
...
Topic 1 question 19 discussion
steeee
:: 16 hours, 35 minutes ago
The correct answer should be A.
upvoted 2 times
...
Topic 1 question 20 discussion
JohnMasipa
:: Highly Voted 1 day, 3 hours ago
This can't be correct. Should be D.
upvoted 5 times
...
Topic 1 question 21 discussion
Blueko
:: Highly Voted 1 day, 4 hours ago
Request: "The solution must minimize how long it takes to load the data to the staging table" The distribution should be
Round-Robin, not Hash, as in the answer's motivations: "Round-robin tables are useful for improving loading speed"
upvoted 5 times
...
A1000
:: Most Recent 18 hours, 39 minutes ago
Round-Robin
Heap
None
upvoted 2 times
...
viper16752
:: 21 hours, 43 minutes ago
Answers should be:
Distribution - Round Robin (See https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-
warehouse/sql-data-warehouse-tables-distribute)
Indexing - Heap (See https://docs.microsoft.com/en-us/azure/synapse-
analytics/sql-data-warehouse/sql-data-warehouse-tables-index)
Partitioning - (It's a staging table, no sense in partitioning
here)
upvoted 2 times
...
Gopinath601
:: 23 hours, 18 minutes ago
I feel that answer is Distribution = Hash
Indexing = Heap
Partitioning = Date
https://docs.microsoft.com/en-
us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-index
upvoted 1 times
...
Nilay95
:: 1 day, 2 hours ago
I think answer should be
1. Round Robin
2. Clustered Columnstore
3. None
Is partitioning allowed in round robin
distribution? Please someone confirm and accordingly modify the answer if needed.
upvoted 2 times
steeee
:: 16 hours, 29 minutes ago
Totally agree with you. Thanks.
upvoted 1 times
...
...
Topic 2 question 1 discussion
Miris
:: Highly Voted 2 months, 2 weeks ago
correct
upvoted 5 times
...
mdalorso
:: Most Recent 3 weeks, 6 days ago
This is Stream Analytics Query Language, a little different than tsql https://docs.microsoft.com/en-us/stream-analytics-
query/last-azure-stream-analytics
upvoted 2 times
AvithK
:: 1 week, 3 days ago
so is the answer DATEDIFF+LAST incorrect then?
upvoted 1 times
...
...
vlad888
:: 1 month, 3 weeks ago
The query has no sense, at least if it is T-SQL. Look: each row is end event or start event. How window function (Last()
over partition) can get start event if there is where condition that filter out end event only???
upvoted 2 times
...
Topic 2 question 2 discussion
Francesco1985
:: Highly Voted 2 months, 1 week ago
correct
upvoted 8 times
...
AvithK
:: Most Recent 1 week, 6 days ago
Bad rows go to 'folder out' and the good rows to the junk table? How come?
upvoted 1 times
...
Topic 2 question 3 discussion
mayank
:: Highly Voted 2 months, 3 weeks ago
As per the link provided in the explanation disjoint:false looks correct. I believe you must go through the link
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-conditional-split and choose you answer for disjoint
wisely . I will go with "False"
upvoted 14 times
...
Alekx42
:: Highly Voted 2 months, 3 weeks ago
I think "disjoint" should be True, so that data can be sent to all matching conditions. In this way the "all" output can get
the data from every department, which ensures that "data can also be processed by the entire company".
upvoted 9 times
Steviyke
:: 2 months ago
I concur with @Alekx42 thought. Since we want to process for each dept (3 streams), then we must ensure we
can still process for ALL depts at the same time (4th or default stream), hence DISJOINT:TRUE. Else,
DISJOINT:FALSE.
upvoted 1 times
...
...
brendy
:: Most Recent 1 week, 2 days ago
The top votes are split, any consensus?
upvoted 1 times
...
Vaishnav
:: 1 month, 2 weeks ago
Answer is correct. Refer below Microsoft doc
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-conditional-
split
upvoted 1 times
...
escoins
:: 1 month, 4 weeks ago
The provided link handles with "all other", we have the situation here with "all". Therefore I think disjoint:true should be
correct.
upvoted 1 times
...
Topic 2 question 4 discussion
sagga
:: Highly Voted 3 months, 1 week ago
I think the correct order is:
1) mount onto DBFS
2) read into data frame
3) transform data frame
4) specify temporary
folder
5) write to table in SQL data warehouse
About temporary folder, there is a note explain this:
https://docs.microsoft.com/en-us/azure/databricks/scenarios/databricks-extract-load-sql-data-warehouse#load-data-into-
azure-synapse
Discussions about this question:
https://www.examtopics.com/discussions/microsoft/view/11653-exam-
dp-200-topic-2-question-30-discussion/
upvoted 41 times
andylop04
:: 1 month, 3 weeks ago
Today I received this question in my exam. Only appeared the 5 options of this response. I only had to order, not
choice. This solutions is the correct. Thanks sagga.
upvoted 9 times
...
labasmuse
:: 3 months, 1 week ago
Hi sagga! Thank you. I do agree....
upvoted 2 times
InvisibleShadow
:: 2 months ago
fix solution on site
upvoted 2 times
...
...
...
Miris
:: Highly Voted 2 months, 2 weeks ago
1) mount the data onto DBFS
2) Read the file into a data frame
3) Perform transformations on the file
4) Specify a
temporary folder to stage the data
5) Write the results to a table in Azure synapse
upvoted 8 times
...
steeee
:: Most Recent 13 hours, 56 minutes ago
The given answer is correct, after read the link provided carefully several times. There's already a service principal. With
that, it's no need to mount. You do need to drop the dataframe as the last step.
upvoted 1 times
...
labasmuse
:: 3 months, 1 week ago
Correct solution: Read the file into a data frame
Perform transformations on the file
Specify a temporary folder to stage
the data
Write the results to a table in Azure synapse
Drop the data frame
upvoted 4 times
ThiruthuvaRajan
:: 2 months, 3 weeks ago
you should not perform transformation on the file.
You need not to drop the dataframe. sagga options are correct
upvoted 2 times
...
Wisenut
:: 3 months, 1 week ago
I believe you perform transformation on the data frame and not on the file
upvoted 5 times
...
...
Topic 2 question 5 discussion
Puneetgupta003
:: Highly Voted 2 months, 1 week ago
ANswers are correct
upvoted 8 times
...
belha
:: Most Recent 1 month, 3 weeks ago
not schedule ?
upvoted 1 times
captainbee
:: 1 month, 2 weeks ago
As the solution says, you cannot use the Delay with Schedule.
upvoted 1 times
...
...
escoins
:: 1 month, 4 weeks ago
why not schedule trigger?
upvoted 1 times
...
Topic 2 question 6 discussion
Sunnyb
:: Highly Voted 2 months, 2 weeks ago
Answer is correct
upvoted 9 times
captainbee
:: 2 months, 2 weeks ago
Agreed. So easy that even ExamTopics got it right.
upvoted 17 times
...
...
Palee
:: Most Recent 1 month, 1 week ago
Right Answer. Answer to 3rd drop down is already in the question.
upvoted 1 times
...
Topic 2 question 7 discussion
zarga
:: Highly Voted 1 month, 2 weeks ago
The third one is wrong because the stream analytics application already exist in the project. The goal is to modify the
current stream analytics application in order to read protobuff data. I think the right answer is the first one in the list
(update input.json file and reference dll)
upvoted 6 times
...
steeee
:: Most Recent 13 hours, 5 minutes ago
Third one should be the first action listed: Change file format in input.json
upvoted 1 times
...
Gowthamr02
:: 2 months, 2 weeks ago
Correct!
upvoted 1 times
...
Topic 2 question 8 discussion
zarga
:: 1 month, 2 weeks ago
A is the right answer (don't use autoresolve region)
upvoted 4 times
...
kishorenayak
:: 2 months ago
Should not this be option A??
https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
"If you
have strict data compliance requirements and need ensure that data do not leave a certain geography, you can explicitly
create an Azure IR in a certain region and point the Linked Service to this IR using ConnectVia property. For example, if
you want to copy data from Blob in UK South to Azure Synapse Analytics in UK South and want to ensure data do not
leave UK, create an Azure IR in UK South and link both Linked Services to this IR."
upvoted 1 times
Dicupillo
:: 1 month, 3 weeks ago
Yes it's option A
upvoted 1 times
...
...
saty_nl
:: 2 months ago
Correct answer.
upvoted 2 times
...
damaldon
:: 2 months, 1 week ago
fully agree
upvoted 1 times
...
Sunnyb
:: 2 months, 3 weeks ago
A is correct
upvoted 2 times
...
Topic 2 question 9 discussion
Sunnyb
:: Highly Voted 2 months, 3 weeks ago
Answer is correct
upvoted 10 times
...
Topic 2 question 10 discussion
saty_nl
:: 2 months ago
Correct answer.
upvoted 3 times
...
damaldon
:: 2 months, 1 week ago
Correct, Tumbling Window is needed to use periodic time intervals
upvoted 2 times
...
Gowthamr02
:: 2 months, 2 weeks ago
Correct!
upvoted 2 times
...
Topic 2 question 11 discussion
Travel_freak
:: 1 week, 4 days ago
correct answer
upvoted 1 times
...
trungngonptit
:: 1 month, 4 weeks ago
correct answer
upvoted 3 times
...
Topic 2 question 12 discussion
Miris
:: Highly Voted 2 months, 2 weeks ago
correct
upvoted 5 times
...
damaldon
:: Most Recent 2 months, 1 week ago
Fully agree
upvoted 2 times
...
Topic 2 question 13 discussion
damaldon
:: Highly Voted 2 months, 1 week ago
Correct!
upvoted 7 times
...
Gowthamr02
:: Highly Voted 2 months, 2 weeks ago
Answer in Correct!
upvoted 5 times
...
Topic 2 question 14 discussion
trungngonptit
:: 1 month, 4 weeks ago
correct, blob storage or azure sql database
upvoted 3 times
...
saty_nl
:: 2 months, 1 week ago
This is correct.
upvoted 4 times
...
Topic 2 question 15 discussion
Whiz_01
:: Highly Voted 3 months ago
This is hopping. It is overlapping
upvoted 32 times
AugustineUba
:: 2 weeks, 1 day ago
100% Hopping
upvoted 3 times
...
...
saty_nl
:: Highly Voted 2 months, 1 week ago
Correct answer is Hopping, as we need to calculate running average, which means it will have overlapping.
upvoted 12 times
...
Kbruv
:: Most Recent 4 days, 2 hours ago
It hopping
upvoted 1 times
...
arvind05
:: 1 month, 1 week ago
Hopping
upvoted 2 times
...
NithyaSara
:: 1 month, 1 week ago
I think the correct answer is Hopping for overlap timeperiod
upvoted 3 times
...
escoins
:: 1 month, 4 weeks ago
Go for hopping
upvoted 1 times
...
damaldon
:: 2 months, 1 week ago
Why is it overlapping?
upvoted 1 times
captainbee
:: 2 months ago
Because it wants to calculate the average costs for the last 15 minutes, every 5 minutes. The diagram is
massively unhelpful
upvoted 1 times
...
...
xig
:: 2 months, 2 weeks ago
The correct answer is hopping. Reference: https://docs.microsoft.com/en-us/stream-analytics-query/hopping-window-
azure-stream-analytics
upvoted 2 times
...
Miris
:: 2 months, 2 weeks ago
hopping - https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
upvoted 3 times
...
nas28
:: 2 months, 2 weeks ago
Hopping bro
upvoted 1 times
...
captainbee
:: 2 months, 2 weeks ago
Hopping mad with this one
upvoted 1 times
...
Ameenymous
:: 2 months, 3 weeks ago
Should be Hopping !
upvoted 4 times
...
ThiruthuvaRajan
:: 2 months, 3 weeks ago
It is hopping window
upvoted 2 times
...
S5e
:: 2 months, 3 weeks ago
It should be Hopping
upvoted 3 times
...
Himlo24
:: 3 months, 2 weeks ago
Agree, this should be hopping
upvoted 4 times
...
stefanos
:: 3 months, 2 weeks ago
I am pretty sure it should be hopping.
upvoted 2 times
...
newuser995
:: 3 months, 2 weeks ago
Shouldn't it be hopping?
upvoted 2 times
...
Diane
:: 3 months, 2 weeks ago
Shouldn't this be hopping?
upvoted 2 times
...
Topic 2 question 16 discussion
Alekx42
:: Highly Voted 2 months, 3 weeks ago
You do not need a Window function. You just process the data and perform the geospatial check as it arrives. See the
same example here:
https://docs.microsoft.com/en-us/azure/stream-analytics/geospatial-scenarios
upvoted 19 times
captainbee
:: 2 months, 2 weeks ago
That's what I thought, there's no reporting over time periods. It's just a case of when this happens, ping it off.
upvoted 2 times
...
...
JackArmitage
:: Highly Voted 2 months, 1 week ago
1. Azure Stream Analytics
2. No Window
3. Point within Polygon
upvoted 13 times
...
Amalbenrebai
:: Most Recent 5 days, 5 hours ago
answers are correct: Hopping is correct
SELECT count(*) as NumberOfRequests, RegionsRefDataInput.RegionName
FROM UserRequestStreamDataInput
JOIN RegionsRefDataInput ON
st_within(UserRequestStreamDataInput.FromLocation, RegionsRefDataInput.Geofence) = 1
GROUP BY
RegionsRefDataInput.RegionName, hoppingwindow(minute, 1, 15)
upvoted 1 times
...
hs28974
:: 1 month, 2 weeks ago
I would say Tumbling window as minimizing cost is a requirement as well. No window indicates you will recalculate if
the point is inside the polygon every time a car moves. A tumbling window will only perform the calculation once every
30 seconds.
upvoted 2 times
GeneralZhukov
:: 3 days, 7 hours ago
Question says data from the vehicles sent to azure event hub only once every minute so this isn't valid reasoning
upvoted 1 times
...
...
Newfton
:: 1 month, 2 weeks ago
The explanation for Hopping Window only states what a hopping window is, not why is the correct answer here. It does
not make sense in this question, I think it should be No Window.
upvoted 1 times
...
Peterlustig2049
:: 1 month, 3 weeks ago
How will the CSV file be read though? I thought Azure Stream Analytics can only load reference from Blob or Azure
SQL?
upvoted 1 times
...
eng1
:: 2 months ago
1. Azure Stream Analytics
2. No Window 3. Point within Polygon
No Window because you can write a query that joins
the device stream with the geofence reference data and generates an alert every time a device is outside of an allowed
building.
SELECT DeviceStreamInput.DeviceID, SiteReferenceInput.SiteID, SiteReferenceInput.SiteName INTO
Output
FROM DeviceStreamInput JOIN SiteReferenceInput
ON st_within(DeviceStreamInput.GeoPosition,
SiteReferenceInput.Geofence) = 0
WHERE DeviceStreamInput.DeviceID = SiteReferenceInput.AllowedDeviceID
https://docs.microsoft.com/en-us/azure/stream-analytics/geospatial-scenarios#generate-alerts-with-geofence
upvoted 5 times
...
nas28
:: 2 months, 2 weeks ago
I would say No window, because Azure streaming service will have to respond when a vehicule is outside an area (by
event), no window since we don't want it to calculate a metric here no mean, no sum.
upvoted 2 times
...
ThiruthuvaRajan
:: 2 months, 2 weeks ago
Answers :
1) Azure streams analytics
2) Hopping windows
3) Point within Polygon
Explained clearly about fencing
https://docs.microsoft.com/en-us/azure/stream-analytics/geospatial-scenarios
upvoted 5 times
...
Whiz_01
:: 3 months ago
Hopping is in the answer. The event is only triggered when a condition is met. Which means we will have overlapping
events.
upvoted 6 times
captainbee
:: 2 months ago
But hopping is for reporting at set intervals? Not for when an event happens.
upvoted 1 times
...
...
sagga
:: 3 months, 1 week ago
isn't it tumbling window?
upvoted 8 times
alain2
:: 3 months, 1 week ago
yes, tumbling window makes more sense
upvoted 2 times
...
...
Topic 2 question 17 discussion
bc5468521
:: Highly Voted 2 months, 4 weeks ago
The ABS-AQS source is deprecated. For new streams, we recommend using Auto Loader instead.
upvoted 5 times
...
belha
:: Most Recent 1 month, 3 weeks ago
TRUE ???
upvoted 1 times
...
Topic 2 question 18 discussion
Sunnyb
:: Highly Voted 3 months ago
1/14 = 0.07
6% = 0.06
should be lowered.
upvoted 8 times
...
MirandaL
:: Highly Voted 2 months, 1 week ago
"We recommend that you increase the concurrent jobs limit only when you see low resource usage with the default
values on each node."
https://docs.microsoft.com/en-us/azure/data-factory/monitor-integration-runtime
upvoted 5 times
...
Jacob_Wang
:: Most Recent 1 month, 3 weeks ago
It might be the ratio. For instance, 2/14 might should be lowered to 2/20.
upvoted 1 times
...
saty_nl
:: 2 months ago
Concurrent jobs limit must be raised, as we are under-utilizing the provisioned capacity.
upvoted 2 times
...
damaldon
:: 2 months, 1 week ago
A) is correct because of HA is set to FALSE
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-
integration-runtime#high-availability-and-scalability
upvoted 1 times
...
terajuana
:: 2 months, 2 weeks ago
the limit should be left as is to allow capacity for more jobs - a single job could use 20% CPU if it is running intensive
work. The pricing model isn't by concurrency so there is no budget rationale to lower it.
upvoted 1 times
...
bc5468521
:: 2 months, 4 weeks ago
2 jobs/node, but the CPU is not fully utilized; based on the workload, don't need too many concurrent jobs, so lower to 1
job/node
upvoted 1 times
...
dfdsfdsfsd
:: 3 months, 1 week ago
I might be misunderstanding this but the way I look at it is that if 2 concurrent jobs use 6% of the CPU, then 1 job
requires 3% CPU and you could have approximately 100/3=33 concurrent jobs. So you can raise the limit. What makes
me insecure is that I imagine not every job would be equal in CPU-load.
upvoted 3 times
Alekx42
:: 2 months, 3 weeks ago
I agree with your explaination. I think lowering the limit makes no sense: the system is underloaded, why should
you limit the parallelism that you could have when many jobs eventually get executed at the same time?
Maintaining the current value could be an option: there are no issues with the current configuration with respect
to the maximum concurrent job value.
Increasing the value is good if we take as true your hypotesis that every
job requires the same CPU %.
upvoted 2 times
...
...
AssilAbdulrahim
:: 3 months, 1 week ago
✑ CPU Utilization: 6%
✑ Concurrent Jobs (Running/Limit): 2/14
I am also confused but I tend to adjust the
explanation because the system still has very low utilization 6% and only 2 out of 14 concurrent jobs are there... Hence I
might think it should be lowered...
Can you please explain why both of you think it should be raised?
upvoted 1 times
AssilAbdulrahim
:: 3 months, 1 week ago
I meant the scalability of nodes should be lowered...
upvoted 1 times
...
...
tanza
:: 3 months, 1 week ago
Concurrent jobs limit should be raised , no?
upvoted 5 times
Preben
:: 2 months, 2 weeks ago
If you eat 1 ice cream a day, but you buy 5 new ones every day -- should you increase the amount of ice cream
you buy, or lower it? This is the same. You are paying for 14 concurrent jobs, but you are only using 2. You are
only using 6 % of the CPU you have purchased, so you are paying for 94 % that you do not use.
upvoted 5 times
bsa_2021
:: 2 months ago
The question is about the action w.r.t. cuncurrent jobs value. Cuncurrent jobs should be raised to make
full use of resources. Also, (if possible) the resources should be lowered so that it is not wasted. I think
the choice of answer raised/lowered should be based on the context and the context here is about the
cuncurrent jobs, not resources. Hence, I think raised would be correct.
upvoted 2 times
Banach
:: 1 month, 2 weeks ago
I understand your point of view, and I understood the question in the same way you did at first.
But after reading carefully the sentence it asks (as you said) about the limit value (or the settings)
of concurrent jobs, knowing that you only use 6% of your CPU with only 2 concurrent jobs.
Therefore, considering the waste of resources, "lowered" is, imo, the correct answer here
(although the formulation of the question is a bit confusing, I admit).
upvoted 1 times
...
...
terajuana
:: 2 months, 2 weeks ago
data factory pricing is based on activity runs and not concurrency
upvoted 2 times
...
...
alain2
:: 3 months, 1 week ago
IMO, it should be lowered because:
. Concurrent Jobs (Running/Limit): 2/14
. CPU Utilization: 6%
upvoted 1 times
...
MacronfromFrance
:: 3 months, 1 week ago
for me, it should be raised. I don't find explanation in the given link... :(
upvoted 2 times
...
...
Topic 2 question 19 discussion
brendy
:: 1 week, 2 days ago
Is this correct?
upvoted 1 times
...
husseyn
:: 2 months, 2 weeks ago
Concurent Jobs should be raised - There is less cpu utilization
upvoted 1 times
husseyn
:: 2 months, 2 weeks ago
please ignore this, it was meant for the question before
upvoted 6 times
...
...
Topic 2 question 20 discussion
Prabagar
:: Highly Voted 2 months, 2 weeks ago
correct answer
upvoted 11 times
...
damaldon
:: Most Recent 2 months, 1 week ago
Fully agree
upvoted 2 times
...
Topic 2 question 21 discussion
Ati1362
:: Highly Voted 2 months, 2 weeks ago
answer correct
upvoted 6 times
...
dragos_dragos62000
:: Most Recent 1 month, 3 weeks ago
I think you can use a session window with 10 sec timeout... is like tumbling window with 10 second window size.
upvoted 2 times
TedoG
:: 1 month ago
I Disagree. The session could be extended if the maximum duration is set longer than the timeout.
upvoted 2 times
...
RyuHayabusa
:: 1 month ago
The important thing to remember in a session window is the maximum duration. So theoretically a 10 second
timout can still result in a window of 20 minutes for example (if every 9 seconds a new event comes in and the
window never "closes"). If the maximum duration would be 10 seconds, I would agree. But as the question is
worded right now, the answer is NO.
https://docs.microsoft.com/en-us/stream-analytics-query/session-window-
azure-stream-analytics
upvoted 3 times
...
EddyRoboto
:: 1 month, 1 week ago
Agree, cause it doesn't overlap any event, just group then in a given time that we can define;
upvoted 1 times
...
...
Topic 2 question 22 discussion
Ati1362
:: Highly Voted 2 months, 2 weeks ago
answer is correct
upvoted 7 times
...
saty_nl
:: Most Recent 2 months, 1 week ago
Answer is A, the same solution can be achieved via hopping window, see below:
https://docs.microsoft.com/en-
us/stream-analytics-query/hopping-window-azure-stream-analytics
upvoted 2 times
captainbee
:: 2 months ago
As eng1 says, it "can" be used to achieve the same affect as a tumbling window, but as they've set it to 5 and 10,
it won't be.
upvoted 3 times
...
eng1
:: 2 months ago
No, the hop size is not equal to window size, and to make a Hopping window the same as a Tumbling window,
specify the hop size to be the same as the window size.
upvoted 8 times
...
...
Topic 2 question 23 discussion
111222333
:: Highly Voted 3 months, 1 week ago
Correct is A
upvoted 13 times
dfdsfdsfsd
:: 3 months, 1 week ago
Agree. Jobs cannot use a high-concurrency cluster because it does not support Scala.
upvoted 3 times
...
...
Wisenut
:: Highly Voted 3 months, 1 week ago
I too agree on the comment by 111222333. As per the requirement " A workload for jobs that will run notebooks that use
Python, Scala, and SOL". Scala is only supported by Standard
upvoted 5 times
...
damaldon
:: Most Recent 2 months, 1 week ago
Answer: A
-Data scientist should have their own cluster and should terminate after 120 mins - STANDARD
-Cluster for
Jobs should support scala - STANDARD
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 1 times
...
Sunnyb
:: 2 months, 2 weeks ago
A is the right answer because Standard cluster supports scala
upvoted 1 times
...
Topic 2 question 24 discussion
alain2
:: Highly Voted 3 months, 1 week ago
B because: "High Concurrency clusters work only for SQL, Python, and R. The performance and security of High
Concurrency clusters is provided by running user code in separate processes, which is not possible in Scala."
upvoted 10 times
...
111222333
:: Highly Voted 3 months, 1 week ago
Correct answer is B.
Jobs use Scala which is not supported in High Concurreny cluster.
upvoted 6 times
...
damaldon
:: Most Recent 2 months, 1 week ago
Answer: B
-Data scientist should have their own cluster and should terminate after 120 mins - STANDARD
-Cluster for
Jobs should support scala - STANDARD
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 4 times
...
Sunnyb
:: 2 months, 2 weeks ago
B is the correct answer
Link below:
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 3 times
...
Topic 2 question 25 discussion
dfdsfdsfsd
:: Highly Voted 3 months, 1 week ago
High-concurrency clusters do not support Scala. So the answer is still 'No' but the reasoning is wrong.
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 8 times
Preben
:: 2 months, 2 weeks ago
I agree that High concurrency does not support Scala. But they specified using a Standard cluster for the jobs,
which does support Scala. Why is the answer 'No'?
upvoted 2 times
eng1
:: 2 months, 1 week ago
Because the High Concurrency cluster for each data scientist is not correct, it should be standard for a
single user!
upvoted 2 times
...
...
...
FRAN__CO_HO
:: Most Recent 2 months, 1 week ago
Answer should be NO, which
Data scientist: STANDARD as need to run scala
Jobs: STANDARD as need to run scala
Data Engineers: High-concurrency clusters as better resource sharing
upvoted 4 times
...
damaldon
:: 2 months, 1 week ago
Answer: NO
-Data scientist should have their own cluster and should terminate after 120 mins - STANDARD
-Cluster
for Jobs should support scala - STANDARD
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 1 times
...
nas28
:: 2 months, 2 weeks ago
Answer correct : No. but the reason is wrong, They want data scientists cluster to shut down automatically after 120
minutes so Standard cluster not high concurrency
upvoted 2 times
...
Sunnyb
:: 2 months, 2 weeks ago
Answer is correct - NO
upvoted 1 times
...
Topic 2 question 31 discussion
JohnMasipa
:: 1 day, 1 hour ago
Can someone please explain why the answer is A?
upvoted 1 times
...
Topic 2 question 37 discussion
fbraza
:: 1 day, 2 hours ago
Delta lake is only available from Scala version 2.12 but the json data has a version of scala of 2.11.
upvoted 1 times
...
Topic 3 question 1 discussion
Sunnyb
:: Highly Voted 2 months, 2 weeks ago
Step 1: Create a Log Analytics workspace that has Data Retention set to 120 days.
Step 2: From Azure Portal, add a
diagnostic setting.
Step 3: Select the PipelineRuns Category
Step 4: Send the data to a Log Analytics workspace.
upvoted 22 times
...
Amalbenrebai
:: Most Recent 1 week ago
in this case we will not use a storage Account to save the diagnostic logs to a storage account, but we will send them to
Log Analytics:
1: Create a Log Analytics workspace that has Data Retention set to 120 days.
2: From Azure Portal, add a
diagnostic setting.
3: Select the PipelineRuns Category
4: Send the data to a Log Analytics workspace
upvoted 2 times
...
mss1
:: 2 weeks, 5 days ago
If you create diagnostics from the Datafactory you wil notice that you can only set the retentiondays when you select a
storage account for the PipelineRuns. So you need a storage account first. You do not have an option in the selection to
create a diagnostic from the datafactory and thus the option "select the pipelineruns" is not an option. I agree with the
current selection.
upvoted 2 times
mss1
:: 2 weeks, 3 days ago
To complete my answer. I also agree with "Sunnyb". There are more solutions to this question.
upvoted 2 times
...
...
herculian_effort
:: 1 month, 1 week ago
step 1. From Azure Portal, add a diagnostic setting.
step 2. Send data to a Log analytics workspace.
step 3. Create a Log
Analytics workspace that has Data Retention set to 120 days.
step 4. Select the PipelineRuns Category.
The video in the
below link walks you through the process step by step, start watching at 2min 30sec mark
https://docs.microsoft.com/en-
us/azure/data-factory/monitor-using-azure-monitor#keeping-azure-data-factory-metrics-and-pipeline-run-data
upvoted 2 times
Armandoo
:: 3 weeks, 1 day ago
This is the correct answer
upvoted 1 times
...
...
mric
:: 2 months ago
According to the linked article, it's: first Storage Account, then Event Hub, and finally Log Analytics.
So I would say:
1-
Create an Azure Storage Account with a lifecycle policy
2- Stream to an Azure Event Hub
3- Create a Log Analytics
workspace that has a Data Retention set to 120 days
4- Send the data to a Log Analytics Workspace
Source:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor#keeping-azure-data-factory-metrics-
and-pipeline-run-data
upvoted 3 times
...
det_wizard
:: 2 months, 4 weeks ago
Take off the storage account and After add diagnostic setting it would be select pipelineruns then send to log analytics
upvoted 2 times
...
teofz
:: 3 months, 1 week ago
regarding the storage account, what is it for?!
upvoted 1 times
sagga
:: 3 months, 1 week ago
I don't know if you need to, see this discussion: https://www.examtopics.com/discussions/microsoft/view/49811-
exam-dp-200-topic-3-question-19-discussion/
upvoted 2 times
...
...
Topic 3 question 2 discussion
damaldon
:: Highly Voted 2 months, 1 week ago
Correct!
upvoted 7 times
...
Topic 3 question 3 discussion
Rob77
:: Highly Voted 3 months, 1 week ago
1. create user from external provider for Group1
2. create Role1 with select on schema1
3. add user to the Role1
upvoted 24 times
...
patricka95
:: Most Recent 1 month, 1 week ago
The suggested answer is wrong. As others have identified, the correct steps are;
1. create user <> from external provider
2. create role <> with select permission on schema
3. add user to role
upvoted 2 times
...
eng1
:: 2 months, 1 week ago
It should be D-E-A
upvoted 1 times
eng1
:: 2 months ago
Please ignore my previous answer, it should be
D: Create a database user in dw1 that represents Group1 and uses
FROM EXTERNAL PROVIDE clause
A: Create a database role named Role1 and grant Role1 SELECT
permissions to schema1
E: Assign Rol1 to the Group1 database user
upvoted 4 times
...
...
eng1
:: 2 months, 1 week ago
It should be C-A-E
upvoted 1 times
...
SG1705
:: 2 months, 1 week ago
Is the answer correct ??
upvoted 1 times
Marcello83
:: 1 month, 3 weeks ago
No, in my opinion it is D, A, E. If you give a reader role to the group, the users will have the possibility to query
all the tables, not only the selected schema.
upvoted 4 times
...
...
Topic 3 question 4 discussion
Francesco1985
:: Highly Voted 2 months, 1 week ago
Guys the aswers are correct: https://docs.microsoft.com/en-us/azure/azure-sql/database/transparent-data-encryption-
byok-overview
upvoted 8 times
...
terajuana
:: Most Recent 2 months, 2 weeks ago
TDE doesn't use client managed keys
answer therefore is
1) always encrypted
2) key vault in 2 regions
upvoted 1 times
Alekx42
:: 2 months, 1 week ago
TDE can be configured with Customer Managed keys:
https://docs.microsoft.com/en-us/azure/azure-
sql/database/transparent-data-encryption-tde-overview?tabs=azure-portal#customer-managed-transparent-data-
encryption---bring-your-own-key
Key vault is configured in multiple regions by microsoft itself. I also double-
checked by creating a key vault and there are no geo-redundancy options. Also see here:
https://docs.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-guidance
upvoted 3 times
...
Alekx42
:: 2 months, 1 week ago
Moreover, always encrypted is NOT TDE option. The question asks to enable TDE.
upvoted 1 times
...
...
Alekx42
:: 2 months, 2 weeks ago
The first answer is correct. You need to enable TDE with customer keys in order to track the key usage in Azure key
vault. The second answer seems wrong, as pointed out by Rob77. AKV does have replication it 2 additional regions by
default. So I guess that it makes more sense to use a Microsoft .NET framwork data provider
https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/data-providers
upvoted 1 times
terajuana
:: 2 months, 2 weeks ago
TDE doesn't operate with customer keys but always encrypted does
upvoted 1 times
...
...
Rob77
:: 3 months, 1 week ago
second answer does not seem to be correct - AKV is already replicated within the region locally (and also 2 pair
regions). Therefore if the datacentre fails (or even whole region) the traffic will be redirected.
https://docs.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-guidance
upvoted 2 times
...
Topic 3 question 5 discussion
damaldon
:: 2 months, 1 week ago
Correct!
upvoted 4 times
...
saty_nl
:: 2 months, 1 week ago
Answer is correct. Dynamic data masking will limit the exposure of sensitive data.
upvoted 2 times
...
Topic 3 question 6 discussion
Alekx42
:: Highly Voted 2 months, 1 week ago
C is the right answer. Check the discussion here:
https://www.examtopics.com/discussions/microsoft/view/18788-exam-
dp-201-topic-3-question-12-discussion/
upvoted 5 times
Tracy_Anderson
:: 1 month ago
The link below show how you can infer a column that is data masked. It is also referenced in the 201 topic,
https://docs.microsoft.com/nl-nl/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver15
upvoted 1 times
...
mikerss
:: 2 months ago
the key word is 'infer'. as listed in the below documentation, data masking is not used to protect against malicious
intent to infer the underlying data. I would therefore choose C
upvoted 1 times
...
...
patricka95
:: Most Recent 1 month, 1 week ago
Column level security is the correct answer. It is obvious based on "The solution must prevent all the salespeople from
viewing or inferring the credit card information.". If masking was used, they could still view or infer the credit card data.
Also, I interpret "Entries" to imply rows.
upvoted 1 times
...
Himlo24
:: 3 months, 1 week ago
Shouldn't the answer be C? Because the salesperson will get an error when trying to query credit card info.
upvoted 3 times
mvisca
:: 3 months, 1 week ago
Nope, the salesperson, generally, uses the last 4 digits of the card to validate, in a pickup for example. They don't
need to know all the others numbers, so data masking is correct.
upvoted 10 times
mbravo
:: 2 months, 2 weeks ago
It is not because there is a requirement that the data should be protected not only from viewing but also
inferring. Masked data can still be inferred using brute force techniques. The only option in this case is C
(Column level encryption).
upvoted 4 times
terajuana
:: 2 months, 2 weeks ago
nope - the question contains
"You need to recommend a solution to provide salespeople with the
ability to view all the entries in Customers"
if you implement column-level security then they
cannot view all items i.e. select * from the table because it will give them an error. The only way
to fulfil the requirement therefore is masking
upvoted 6 times
captainbee
:: 1 month, 2 weeks ago
Ironically DP-200 has the exact same question and everyone was leaning toward Column
Level Security. I think being able to look at all entries means looking at all ROWS, rather
than columns. They're able to do that still with CLS, just can't see all columns. You can
still infer when there's data masking.
upvoted 1 times
...
escoins
:: 1 month, 4 weeks ago
absolutely right. The key word is "all the entries"
upvoted 1 times
...
...
Preben
:: 2 months, 2 weeks ago
"You need to recommend a solution to provide salespeople with the ability to view all the entries
in Customers."
Credit card data is an entry in the Customers table. How can they view that entry
if you use column level encryption?
upvoted 2 times
...
...
...
...
Topic 4 question 1 discussion
Preben
:: Highly Voted 2 months, 2 weeks ago
Correct.
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
Embarrassingly parallel
jobs
Step 3 and 4.
upvoted 5 times
...
nichag
:: Most Recent 4 weeks ago
Shouldn't the number of partitions only be 8, since the question only asks about the output?
upvoted 1 times
...
rumosgf
:: 2 months, 3 weeks ago
Why 16? Don't understand...
upvoted 2 times
mbravo
:: 2 months, 2 weeks ago
Embarrassingly parallel jobs
upvoted 6 times
captainbee
:: 2 months ago
It's not THAT embarrassing
upvoted 2 times
...
...
...
Topic 4 question 2 discussion
lara_mia1
:: Highly Voted 2 months, 3 weeks ago
1. Hash Distributed, ProductKey because >2GB and ProductKey is extensively used in joins
2. Hash Distributed,
RegionKey because "The table size on disk is more than 2 GB." and you have to chose a distribution column which: "Is
not used in WHERE clauses. This could narrow the query to not run on all the distributions." source:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-
distribute#choosing-a-distribution-column
upvoted 18 times
niceguy0371
:: 1 week ago
Disagree on nr. 1 because of the reason you give for nr. 2. (choose a distribution column that is not used in where
clauses. A join is also a where clause
upvoted 1 times
...
vblessings
:: 3 weeks, 6 days ago
i agree
upvoted 1 times
...
Marcello83
:: 1 month, 3 weeks ago
I agree with lara_mia1
upvoted 1 times
...
...
Rob77
:: Highly Voted 3 months, 1 week ago
Both hash as both are > 2GB. In the 2nd table RegionKey cannot be used with round_robin distribution as round_robin
does not take a distribution key...
upvoted 15 times
...
DarioEtna
:: Most Recent 1 week, 6 days ago
as for me i guess this is the right choice:
1. Hash Distributed, RegionKey because 2. Hash Distributed, RegionKey
because "When two large fact tables have frequent joins, query performance improves when you distribute both tables on
one of the join columns" [Microsoft Documentation]
If we use for one ProductKey and for one RegionKey maybe the
data movements would increase...or not?
upvoted 1 times
DarioEtna
:: 1 week, 6 days ago
But we cannot use ProductKey in both because in Invoice table it is used in WHERE condition
upvoted 1 times
...
...
Amalbenrebai
:: 4 weeks, 1 day ago
Regarding the invoces table, we can use the Round-robin distribution because there is no obvious joining key in the table
upvoted 1 times
...
zarga
:: 1 month, 2 weeks ago
1. Hash on product key
2. Hash on region key (used on group by and have 65 unique values)
upvoted 2 times
...
BrennaFrenna
:: 2 months, 2 weeks ago
The sales table makes sense with hashing distribution on ProductKey and since there is no obvious joining key for
invoices, you should use round robin distribution on RegionKey. When it would be a smaller table you should use
replicated.
upvoted 3 times
...
tubis
:: 2 months, 2 weeks ago
When it says 75% of records related to one of the 40 regions, if we partition the Sales by Region, isn't it improve the
reading process drastically in compare to productKey?
upvoted 1 times
patricka95
:: 1 month, 1 week ago
No, if 75% relate to one region and we hash on region, that means that those will all be on one node and there
will be skew. Correct answers are Hash, Product, Hash, Region.
upvoted 1 times
...
Preben
:: 2 months, 2 weeks ago
That's 75 % of 61 % of the regions that will be done effectively. That's only efficient for 45 % of the queries. Not
a whole lot.
upvoted 2 times
...
...
bc5468521
:: 2 months, 4 weeks ago
I AGREE WITH BOTH HASH WITH PRODUCT KEY
upvoted 5 times
...
Topic 4 question 3 discussion
SG1705
:: Highly Voted 2 months, 1 week ago
Why ??
upvoted 6 times
okechi
:: 2 months ago
Why ?? Because When you add the "WHERE" clause to your T-SQL query it allows the query optimizer
accesses only the relevant partitions to satisfy the filter criteria of the query - which is what partition elimination
is all about.
upvoted 5 times
...
IgorLacik
:: 2 months ago
Maybe this? https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
I think I
read somewhere in the docs that you cannot apply complex queries on partition filtering, cannot find it though
(not much help I guess, but hopefully better than nothing)
upvoted 1 times
...
...
elimey
:: Most Recent 1 month ago
correct
upvoted 1 times
...
Topic 4 question 4 discussion
rjile
:: Highly Voted 1 month, 2 weeks ago
correct B
upvoted 5 times
...
Avinash75
:: Most Recent 1 month, 2 weeks ago
Incoming queries use the primary key SaleKey column to retrieve data as displayed in the following table ..doesn't this
mean Salekey will be used in where clause , which makes Salekey not suitable for hashkey distribution .
Choosing a
distribution column that helps minimize data movement is one of the most important strategies for optimizing
performance of your dedicated SQL pool:
- Is not used in WHERE clauses. This could narrow the query to not run on all
the distributions.
with no obvious choice i feel it should be round robin with column clustered index i.e D
upvoted 1 times
...
erssiws
:: 2 months, 1 week ago
I understand that hash distribution mainly for improving the joins and group-by to reduce the data shuffling. In this case,
there is no join or group-by mentioned. I think round-robin would be a better option.
upvoted 1 times
...
Yatoom
:: 2 months, 2 weeks ago
If the answer is hash distributed, then what would be the key? If there is no obvious joining key, round-robin should be
chosen (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-
distribute#round-robin-distributed)
upvoted 1 times
Preben
:: 2 months, 2 weeks ago
It says it uses the SaleKey. Round-robin is generally not effective at these large scale tables. The 10 tb was a very
important hint here.
upvoted 9 times
...
...
Topic 4 question 5 discussion
Marcello83
:: 1 month, 3 weeks ago
Why not non-clustered columnstore index ? I do not find clear the different use cases of clustered and non-clustered
columnstore indexes...
upvoted 1 times
lsdudi
:: 1 month, 1 week ago
non-clustered columnstore index dosen't exists
upvoted 3 times
...
...
damaldon
:: 2 months, 1 week ago
correct!
upvoted 3 times
...
Miris
:: 2 months, 2 weeks ago
correct
upvoted 3 times
...
Topic 4 question 6 discussion
dragos_dragos62000
:: 1 month, 3 weeks ago
Correct
upvoted 2 times
...
Topic 4 question 7 discussion
erssiws
:: 2 months, 1 week ago
Activity logs show only activities, e.g., trigger the pipeline, stop the pipeline, ...
Resource health check shows only the
healthiness of the resource.
The monitor app indeed contains the pipeline run failure information. But it keep the data
only for 45 days.
upvoted 3 times
...
damaldon
:: 2 months, 1 week ago
Correct!
upvoted 2 times
...
Topic 4 question 8 discussion
MinionVII
:: 1 month, 2 weeks ago
Correct.
"Backlogged Input Events Number of input events that are backlogged. A non-zero value for this metric implies
that your job isn't able to keep up with the number of incoming events. If this value is slowly increasing or consistently
non-zero, you should scale out your job."
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-
monitoring
upvoted 2 times
...

You might also like