DP 100 Exam Ques

1.
Question 1
Your task is to predict if a person suffers from a disease by setting up a binary classification model.
Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having no disease”.
1 / 1 point
True negatives
False negatives
False positives
True positives
Correct
A true negative is an outcome where the model correctly predicts the negative class.
2.
Question 2
Your company is asking you to analyze a dataset that contains historical data obtained from a local
car-sharing company. For this task, you decide to develop a regression model and you want to be
able to foretell what price a trip will be. For the correct evaluation of the regression model, you have
to use performance metrics.
In this scenario, what are the best two metrics?
1 / 1 point
A Root Mean Square Error value that is low
Correct
RMSE and R2 are both metrics for regression models. Root mean squared error (RMSE) creates a
single value that summarizes the error in the model.
An R-Squared value close to 0
An F1 score that is low
An R-Squared value close to 1
Correct
RMSE and R2 are both metrics for regression models. Coefficient of determination, often referred to
as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the
model is random (explains nothing); 1 means there is a perfect fit.
3.
Question 3
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the
student’s length of education, degree type, and art form. You decide to set up a linear regression
model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error,
Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
1 / 1 point
Yes
No
Correct
Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models;
Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear
regression model.
4.
Question 4
Your task is to create and evaluate a model. You decide to use a specific metric that provides you a
direct proportionality with how well the model fits.
What is the evaluation model described above?
1 / 1 point
Mean Square Error (MSE)

Coefficient of Determination (known as R-squared or R2)
Root Mean Square Error (RMSE)
Correct
This is the evaluation metric described. In essence, this metric represents how much of the variance
between predicted and actual label values the model is able to explain.
5.
Question 5
How should the following sentence be completed?
One example of the machine learning […] type models are the Decision trees algorithms.
0 / 1 point
Classification
Clustering
Regression
Incorrect
Try going back and reviewing Train and Evaluate Regression Models.
6.
Question 6
You have a Pandas DataFrame entitled df_sales that contains the sales data from each day. You
DataFrame contains these columns: year, month, day_of_month, sales_total. Which of the following
codes should you choose if your goal is to return the average sales_total value?
0 / 1 point
df_sales['sales_total'].mean()
df_sales['sales_total'].avg()
mean(df_sales['sales_total'])
Incorrect
Try going back and reviewing Exercise - Explore data.
7.
Question 7
Choose from the list below the evaluation metric that provides you with an absolute metric in the
same unit as the label.
0 / 1 point
Incorrect
Try going back and reviewing Exercise - Train and evaluate a regression model.
8.
Question 8
Which are two appropriate ways to approach a problem when using multiclass classification?
1 / 1 point
Rest minus One
One vs Rest
Correct
One vs Rest (OVR), in which a classifier is created for each possible class value, with a positive
outcome for cases where the prediction is this class, and negative predictions for cases where the
prediction is any other class.
One vs One
Correct
One vs One (OVO), in which a classifier for each possible pair of classes is created.
One and Rest

9.
Question 9
In order to train your K-Means clustering model that enables grouping observations into four
clusters, you decide to use scikit-learn library. Considering this scenario, what method should you
choose to create the K-Means object?
0 / 1 point
model = KMeans(n_clusters=4)
model = Kmeans(n_init=4)
model = Kmeans(max_iter=4)
Incorrect
Try going back and reviewing Exercise - Train and evaluate a clustering model.
10.
Question 10
Which of the layer types described below is a principal one that retrieves important features in
images and works by putting a filter to images?
1 / 1 point
Convolutional layer
Pooling layer
Flattening layer
Correct
One of the principal layer types is a convolutional layer that extracts important features in images. A
convolutional layer works by applying a filter to images.
11.
Question 11
You want to set up a new Azure subscription. The subscription doesn’t contain any resources.
Your goal is to create an Azure Machine Learning workspace.

Considering this scenario, which are three possible ways to obtain this result? Keep in mind that
every correct answer presents a complete solution.
0 / 1 point
Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with
name, subscription_id, and resource_group parameters.
This should not be selected

Try going back and reviewing Introduction to the Azure Machine Learning SDK.
Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/

workspaces resource and its dependencies.
Correct
This is one way to achieve the goal.
Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the
az group create function with –name and –location parameters, and then the az ml workspace
create function, specifying Cw and Cg parameters for the workspace name and resource group.
Correct
Navigate to Azure Machine Learning studio and create a workspace.

Try going back and reviewing Introduction to the Azure Machine Learning SDK.
Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with
name, subscription_id, resource_group, and location parameters.
Correct
12.
Question 12
You decide to use GPU-based training to develop a deep learning model on Azure Machine
Learning service that is able to recognize image.
The context where you have to configure the model needs to allow real-time GPU-based inferencing.
Considering that you have to set up compute resources for model inferencing, what is the most
suitable compute type?
0 / 1 point
Field Programmable Gate Array
Azure Container Instance
Azure Kubernetes Service
Machine Learning Compute
Incorrect
Try going back and reviewing Deploy real-time machine learning services with Azure Machine
Learning.
13.
Question 13
You decide to use the code below for the deployment of a model as an Azure Machine Learning
real-time web service:
# ws, model, inference_config, and deployment_config defined previously
service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)
service.wait_for_deployment(True)
Your deployment does not succeed.
You have to troubleshoot the deployment failure in order to determine what actions were taken while
deploying and to identify the one action that encountered a problem and didn’t succeed.
For this scenario, which of the following code snippets should you use?
0 / 1 point
service.state
service.get_logs()
service.serialize()
service.update_deployment_state()
Incorrect
Learning.
14.
Question 14
You decide to register and train a model in your Azure Machine Learning workspace.
Your pipeline needs to ensure that the client applications are able to use the model for batch
inferencing.
Your single ParallelRunStep step pipeline uses a Python inferencing script in order to obtain
predictions from the input data.
Your task is to configure the inferencing script for the ParallelRunStep pipeline step.
Which are the most suitable two functions that you should use? Keep in mind that every correct
answer presents a part of the solution.
1 / 1 point
main()
init()
Correct
This function is called when the pipeline is initialized.
score(mini_batch)
batch()
run(mini_batch)
Correct
This function is called for each batch of data to be processed.
15.
Question 15
After installing the Azure Machine Learning Python SDK, you decide to use it to configure on your
subscription a workspace entitled “aml-workspace”.
What code should you write in Python for this task?
1 / 1 point
azureml.core import Workspace
ws = Workspace.create(name='aml-workspace',
subscription_id='123456-abc-123...',
resource_group='aml-resources',
create_resource_group=False,
location='eastus'
from azureml.core import Workspace
location='eastus'
)
create_resource_group=True,
location='eastus'
Correct
This is the correct and complete command to run for this scenario.
16.
Question 16
If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace,
what Python command would be the most appropriate?
0 / 1 point
ws = from.config_Workspace()
ws = Workspace.from.config
ws = Workspace.from_config()
Incorrect
Try going back and reviewing Azure Machine Learning tools and interfaces.
17.
Question 17
If you want to extract a dataset after its registration, what are the most suitable methods you should
choose from the Dataset class?
0 / 1 point
find_by_name

Try going back and reviewing Introduction to datasets.
get_by_id
get_by_name
find_by_id

18.
Question 18
What are the most appropriate SDK commands you should choose if you want to publish the
pipeline that you created?
0 / 1 point
publishedpipeline = pipeline_publish(name='training_pipeline',
description='Model training pipeline',
version='1.0')
published.pipeline = pipeline.publish(name='training_pipeline',
version='1.0')
published.pipeline = pipeline_publish(name='training_pipeline',
version='1.0')
published_pipeline = pipeline.publish(name='training_pipeline',
version='1.0')
Incorrect
Try going back and reviewing Publish pipelines.
19.
Question 19
True or False?
Before publishing, a pipeline needs to have its parameters defined.
1 / 1 point
True
False
Correct
You must define parameters for a pipeline before publishing it.
20.
Question 20
Choose from the options below the one that explains how are values for hyperparameters selected
by random sampling.
0 / 1 point
It tries to select parameter combinations that will result in improved performance from the previous
selection
From a mix of discrete and continuous values

It tries every possible combination of parameters in the search space
Incorrect
Try going back and reviewing Configuring sampling.
21.
Question 21
What Python code should you write if your goal is to implement a median stopping policy?
0 / 1 point
from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)
from azureml.train.hyperdrive import MedianStoppinPolicy
early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,
evaluation_interval=1,
delay_evaluation=5)
from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(truncation_percentage=10,
evaluation_interval=1,
delay_evaluation=5)
Incorrect
Try going back and reviewing Configuring early termination.
22.
Question 22
What code should you write for a PFIExplainer if you have a model entitled loan_model?
0 / 1 point
from interpret.ext.blackbox import PFIExplainer
pfi_explainer = PFIExplainer(model = loan_model,
initialization_examples=X_test,
classes=['loan_amount','income','age','marital_status'],
features=['reject', 'approve'])
from interpret.ext.blackbox
features=['loan_amount','income','age','marital_status'],
classes=['reject', 'approve'])
explainable_model= DecisionTreeExplainableModel,
Incorrect
Try going back and reviewing Using explainers.
23.
Question 23
Your task is to train a binary classification model in order for it to be able to target the correct
subjects in a marketing campaign.
What actions should you take if you want to ensure that your model is fair and will not be inclined to
ethnic discrimination?
1 / 1 point
Evaluate each trained model with a validation dataset, and use the model with the highest accuracy
score. An accurate model is inherently fair.
Remove the ethnicity feature from the training dataset.
Compare disparity between selection rates and performance metrics across ethnicities.
Correct
By using ethnicity as a sensitive field, and comparing disparity between selection rates and
performance metrics for each ethnicity value, you can evaluate the fairness of the model.
24.
Question 24
You decided to preprocess and filter down only the relevant columns for your AirBnB housing
dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function
that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction("George")
Your goal is to use the spark.sql.register in order to create a UDF from the function above, because
you want to ensure that the UDF will be created in the SQL namespace.
Considering this scenario, what code should you write?
0 / 1 point
airbnbDF.createAndReplaceTempView("airbnbDF")
spark.udf.register(sql_udf.firstInitialFunction)
airbnbDF.replaceTempView("airbnbDF")
spark.udf.register("sql_udf", firstInitialFunction)
airbnbDF.createTempView("airbnbDF")
spark.udf.register(sql_udf = firstInitialFunction)
airbnbDF.createOrReplaceTempView("airbnbDF")
Incorrect
Try going back and reviewing Work with user-defined functions.
25.
Question 25
You discover a median value for a number of variables in your AirBnB Housing dataset, variables
like the number of rooms, per capita crime and economic status of residents.
Depending on the average number of rooms, you want to be able to predict the median home value
by using Linear Regression.
You decided to use VectorAssembler to import the dataset and to create your column entitled
features that includes a single input variable entitled rm.
At this moment you have to fit the Liner Regression model.
0 / 1 point
from pyspark import LinearRegression
lr = LinearRegression(featuresCol="features", labelCol="medv")
lrModel = lr.fit(bostonFeaturizedDF)
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol="rm", labelCol="medv")
lrModel = lr_fit(bostonFeaturizedDF)
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol="features", labelCol="medv")
lrModel = lr.fit(bostonFeaturizedDF)
from pyspark.ml import LinearRegression
lr = LinearRegression(featuresCol="rm ", labelCol="medv")
lrModel = lr_fit(bostonFeaturizedDF)
Incorrect
Try going back and reviewing Train a machine learning model.
26.
Question 26
You want to evaluate a Python NumPy array that has six data points with the following definition:
data = [10, 20, 30, 40, 50, 60]
Your task is to use the k-fold algorithm implementation in the Python Scikit-learn machine learning
library to generate the output that follows: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test:
[10 50] train: [10 20 30 50], test: [40 60]
In order to generate the output, you have to implement a cross-validation.
To give the correct answer, you have to replace the code comments that are bolded with some
suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from numpy import array
from sklearn.model_selection import #1st option
data – array ([10, 20, 30, 40, 50, 60])
kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)
for train, test in kFold, split( #3rd option):
print (‘train’: %s, test: %5’ % (data[train], data[test])
0 / 1 point
K-means, 6, array
K-fold, 3, array
CrossValidation, 3, data
K-fold, 3, data
Incorrect
Try going back and reviewing Perform model selection with hyperparameter tuning.
27.
Question 27
For your experiment in Azure Machine Learning you decide to run the following code:
from azureml.core import Workspace, Experiment, Run
from azureml.core import RunConfig, ScriptRunConfig
run_config = RunConfiguration()
run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’, script=’experiment.py’,
run_config=run_config)
experiment = Experiment(workspace=ws, name=’script experiment’)
run = experiment.submit(config=script_config)
run.wait_for_completion()
The experiment run generates several output files that need identification.
In order to retrieve the output file names, you must write some code. Which of the following code
snippets should you choose to complete the script?
0 / 1 point
files = run.get_metrics()
files = run.get_properties()
files = run.get_fine_names()
files = run.get_details_with_logs()
Incorrect
Try going back and reviewing Work with Azure Machine Learning to deploy serving models.
28.
Question 28
One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
In order for the machine learning model to know how to handle the room types, you have to firstly
encode every unique string into a number.
What code should you write to achieve this goal?
0 / 1 point
from pyspark.ml.feature import StringIndexer
uniqueTypesDF = airbnbDF.select("room_type").distinct()
indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")
indexerModel = indexer.transform(uniqueTypesDF)
indexedDF = indexerModel.transform(uniqueTypesDF)
display(indexedDF)
from pyspark.ml.feature import Indexer
indexerModel = indexer.fit(uniqueTypesDF)
display(indexedDF)
indexer = StringIndexer(inputCol="room_type”)
display(indexedDF)
display(indexedDF)
Incorrect
Try going back and reviewing Perform featurization of the dataset.
29.
Question 29
Your task is to extract from the experiments list the last run.
What code should you write in Python to achieve this?
0 / 1 point
runs = client.search_runs(experiment_id, order_by=["attributes.start_time desc"], max_results=3)
runs[0].data.metrics
runs = client.search_runs(experiment_id, order_by=["attributes.start_time desc"], max_results=1)
runs = client.search_runs(experiment_id, order_by=["attributes.start_time asce"], max_results=1)
runs = client.search_runs(experiment_id, order_by=["attributes.start_time"], max_results=1)
Incorrect
Try going back and reviewing Use MLflow to track experiments, log metrics, and compare runs.
30.
Question 30
Choose from the list below the cross-validation technique that belongs to the exhaustive type.
0 / 1 point
K-fold cross-validation

Try going back and reviewing Describe model selection and hyperparameter tuning.
Leave-one-out cross-validation
Leave-p-out cross-validation
Correct
Leave-p-out cross-validation (LpO CV) is an exhaustive type of cross-validation technique. It
involves using p observations as the validation set and the remaining observations as the training
set. This is repeated on all ways to cut the original sample on a validation set of p observations and
a training set.
Holdout cross-validation
31.
Question 31
You decided to use Azure Machine Learning and your goal is to train a Diabetes Model and build a
container image for it.
You choose to make use of the scikit-learn ElasticNet linear regression model.
You want to use Azure Kubernetes Service (AKS) for the model deployment to production.
You have to create an active AKS cluster by using the Azure ML SDK.
You decide to use the standard configuration.
What code should you write for this task?
1 / 1 point
aks_target = ComputeTarget.workspace = workspace
(name = aks_cluster_name,
provisioning_configuration = prov_config)
aks_target = ComputeTarget.create(workspace = workspace,
name = aks_cluster_name,
aks_target = ComputeTarget.deploy(workspace = workspace,
name = aks_cluster_name,
aks_target = ComputeTarget.create(workspace = workspace,
name = aks_cluster_name,)
Correct
This is the correct code for this task.
32.
Question 32
If you want to list the generated files after your experiment run is completed, what is the most
suitable object run you should choose?
0 / 1 point
list_file_names
download_files
download_file
get_file_names
Incorrect
Try going back and reviewing Registering models.
33.
Question 33
Your hyperparameter tuning needs to have a search space defined. The values of the batch_size
hyperparameter can be 128, 256, or 512 and the normal distribution values for the learning_rate
hyperparameter can have a mean of 10 and a standard deviation of 3.
What Python code should you write in order to achieve this goal?
0 / 1 point
from azureml.train.hyperdrive import choice, normal
param_space = {
'--batch_size': choice(128, 256, 512),
'--learning_rate': lognormal(10, 3)
param_space = {
'--learning_rate': qnormal(10, 3)
param_space = {
'--learning_rate': normal(10, 3)
}
from azureml.train.hyperdrive import choice, uniform
param_space = {
'--learning_rate': uniform(10, 3)
Incorrect
Try going back and reviewing Defining a search space.
34.
Question 34
You intend to use the Hyperdrive feature of Azure Machine Learning to determine the optimal
hyperparameter values when training a model.
You need to use Hyperdrive to try combinations of the following hyperparameter values:
-- learning_rate: any value between 0.001 and 0.1
-- batch_size: 16, 32, or 64
You must configure the search space for the Hyperdrive experiment.
Which two parameter expressions should you use? Each correct answer presents part of the
solution.
0 / 1 point
A choice expression for learning_rate

Try going back and reviewing Deploy batch inference pipelines and tune hyperparameters with
Azure Machine Learning.
A choice expression for batch_size
Correct
Discrete hyperparameters are specified as a choice among discrete values. choice can be: one or
more comma-separated values -- a range object -- any arbitrary list object.
A uniform expression for learning_rate
Correct
Continuous hyperparameters are specified as a distribution over a continuous range of values.
Supported distributions include:
-- uniform(low, high) - Returns a value uniformly distributed between low and high.
A normal expression for batch_size
35.
Question 35
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
0 / 1 point
Box plot
A violin plot
Binary classification confusion matrix
Gradient descent
Incorrect
Try going back and reviewing Create a classification model with Azure AI.
1.
Question 1
Your task is to predict if a person suffers from a disease by setting up a binary classification model.
Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having a disease”.
1 / 1 point
True positives
False positives
False negatives
True negatives
Correct
A false positive is an outcome where the model incorrectly predicts the positive class.
2.
Question 2
As a senior data scientist, you need to evaluate a binary classification machine learning model.
As evaluation metric, you have to use the precision. Considering this, which is the most appropriate
visualization?
0 / 1 point
Receiver Operating Characteristic (ROC) curve
Violin plot
Scatter plot
Gradient descent
Incorrect
Try going back to Train and evaluate Classification models.
3.
Question 3
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the
student’s length of education, degree type, and art form. You decide to set up a linear regression
model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error,
Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
1 / 1 point
Yes
No
Correct
Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models;
Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear
regression model.
4.
Question 4
Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the
What is the metric described above?
1 / 1 point
Correct
This is the described metric. This means that the smaller the value, the better the model.
5.
Question 5
Python is commonly known to ensure extensive functionality with powerful and statistical numerical
libraries. What are the utilities of TensorFlow?
0 / 1 point
Providing attractive data visualizations
Analyzing and manipulating data
Offering simple and effective predictive data analysis
Supplying machine learning and deep learning capabilities
Incorrect
Try going back and reviewing Explore & Analyse Data with Python.
6.
Question 6
If you multiply by 2 a list and a NumPy array, what result would you get?
1 / 1 point
Multiplying a list by 2 creates a new list 2 times the length with the original sequence repeated 2
times.
Correct
This is how a list behaves when multiplied.
Multiplying a NumPy array by 2 performs an element-wise calculation on the array, which sees the
array stay the same size, but each element has been multiplied by 2.
Correct
This is how a NumPy array behaves when multiplied.
Multiplying an NumPy array by 2 creates a new array 2 times the length with the original sequence
repeated 2 times.
Multiplying a list by 2 performs an element-wise calculation on the list, which sees the list stay the
same size, but each element has been multiplied by 2.
7.
Question 7
Choose from the list below the evaluation metric that provides you with an absolute metric in the
1 / 1 point
Correct
This is the described metric. This means that the smaller the value, the better the model.
8.
Question 8
Four possible prediction outcomes are able to provide you with the Precision and Recall metrics.
What is the outcome in the scenario where the predicted label is 1, but the actual label is 0?
0 / 1 point
False Positive
True Negative
True Positive
False Negative
Incorrect
Try going back and reviewing Exercise - Train and evaluate a classification model.
9.
Question 9
Your deep neural network is in the process of training. You decided to set 30 epochs to the training
process configuration.
In this scenario, what would happen to the model’s behavior?
0 / 1 point
The first 30 rows of data are used to train the model, and the remaining rows are used to validate it
The entire training dataset is passed through the network 30 times
The training data is split into 30 subsets, and each subset is passed through the network
Incorrect
Try going back and reviewing Train a deep neural network.
10.
Question 10
Which of the layer types described below is a principal one that retrieves important features in
images and works by putting a filter to images?
1 / 1 point
Convolutional layer
Flattening layer
Pooling layer
Correct
One of the principal layer types is a convolutional layer that extracts important features in images. A
convolutional layer works by applying a filter to images.
11.
Question 11
You are using an Azure Machine Learning service for your data science project. In order to deploy
the project, you have to choose a compute target. For this scenario, which of the following Azure
services is the most suitable?
0 / 1 point
Azure Databricks
Azure Data Lake Analytics
Apache Spark for HDInsight
Azure Container Instances
Incorrect
Try going back and reviewing Work with Compute in Azure Machine Learning.
12.
Question 12
You have a set of CSV files that contain sales records. Your CSV files follow an identical data
schema.
The sales record for a certain month are held in one of the CSV files and the filename is sales.csv.
For every file there is a corresponding storage folder that shows the month and the year for the data
recording. In an Azure Machine Learning workspace has been set up a datastore for the folders kept
in an Azure blob container. The parent folder entitled sales contains the folders organized to create
the hierarchical structure below:
/sales
/01-2019
/sales.csv
/02-2019
/sales.csv
/03-2019
/sales.csv
…
In the sales folder is added a new folder with a certain month’s sales every time that month has
ended. You want to train a machine learning model by using the sales data while complying with the
requirements below:
- All of your sales data have to be loaded to date by a dataset and into a structure that enables easy
conversion to a dataframe.
- You have to ensure that experiments can be done by using only the data created until a specific
previous month, disregarding any data added after the month selected.
- You have to keep the number of registered datasets to the minimum possible.
Considering that the sales data have to be registered as a dataset in the Azure Machine Learning
service workspace, what actions should you take?
1 / 1 point
Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month,
replacing the existing dataset and specifying a tag named month indicating the month and year it
was registered. Use this dataset for all experiments.
Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version
and with a tag named month indicating the month and year it was registered. Use this dataset for all
experiments, identifying the version to be used based on the month tag as necessary.
Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each
month with appropriate MM and YYYY values for the month and year. Use the appropriate month-
specific dataset for experiments.
Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv',
register the dataset with the name sales_dataset and a tag named month indicating the month and
year it was registered, and use this dataset for all experiments.
Correct
This is the correct approach to this scenario.
13.
Question 13
You decide to use Azure Machine Learning designer for your real-time service endpoint. You can
make use of only one Azure Machine Learning service compute resource.
You start training the model and preparing the real-time pipeline for deployment.
If you want to obtain a web service by publishing the inference pipeline, what is the most suitable
compute type?
0 / 1 point
HDInsight
Azure Kubernetes Services
a new Machine Learning Compute resource
Azure Databricks
the existing Machine Learning Compute resource
Incorrect
Learning.
14.
Question 14
Yes or No?
In order to explain the model’s predictions, you have to calculate the importance of all the features,
taking into account the overall global relative importance value, but also the measure of local
importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an
explainer.
Solution: Configure a PFIExplainer. Is this solution effective?

0 / 1 point
Yes
No
Incorrect
Try going back and reviewing Explain machine learning models with Azure Machine Learning.
15.
Question 15
Yes or No?
You use a logistic regression algorithm to train your classification model. In order to explain the
model’s predictions, you have to calculate the importance of all the features, taking into account the
overall global relative importance value, but also the measure of local importance for a certain set of
predictions.
You decide to obtain the global and local feature importance values that you need by using an
explainer.
Solution: Configure a TabularExplainer. Is this solution effective?
0 / 1 point
Yes
No
Incorrect
Try going back and reviewing Explain machine learning models with Azure Machine Learning.
16.
Question 16
If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace,
what Python command would be the most appropriate?
0 / 1 point
ws = from.config_Workspace()
ws = Workspace.from.config
Incorrect
Try going back and reviewing Azure Machine Learning tools and interfaces.
17.
Question 17
If you want to use the from_delimited_files method of the Dataset.Tabular class to configure and
register a tabular dataset, what are the most appropriate Python commands?
0 / 1 point
from azureml.core import Dataset
blob_ds = ws.get_default_datastore()
csv_paths = [(blob_ds, 'data/files/current_data.csv'),
(blob_ds, 'data/files/archive/*.csv')]
tab_ds = Dataset.Tabular.from_delimited_files()
tab_ds = tab_ds.register(workspace=ws, name='csv_table')
tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
(blob_ds, 'data/files/archive/csv')]
blob_ds = ws.change_default_datastore()
Incorrect
18.
Question 18
Your task is to use the SDK in order to define a compute configuration for a managed compute
target.
Which of the following commands will return you the expected result?
0 / 1 point
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',
min_nodes=0, max_nodes=4,
vm_priority='dedicated')
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',
compute_config = AmlCompute.provisioning.configuration(vm_size='STANDARD_DS11_V2',
compute_config = AmlCompute_provisioning_configuration(vm_size='STANDARD_DS11_V2',
Incorrect
Try going back and reviewing Create compute targets.
19.
Question 19
Your task is to deploy your service on an AKS cluster that is set up as a compute target.
What SDK commands are able to return you the expected result?
0 / 1 point
from azureml.core.webservice import ComputeTarget, AksCompute
cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='eastus')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.core.webservice import ComputeTarget, AksWebservice
from azureml.core.compute import ComputeTarget, AksCompute
production_cluster = ComputeTarget.deploy (ws, cluster_name, compute_config)
Incorrect
Try going back and reviewing Deploy a model as a real-time service.
20.
Question 20
If you want to extract the parallel_run_step.txt file from the output of the step after the pipeline run
has ended, what code should you choose?
1 / 1 point
prediction_run = next(pipeline_run.get_children())
prediction_output = prediction_run.get_output_data('inferences')
prediction_output.download(local_path='results')
for root, dirs, files in os.walk('results'):
for file in files:
if file.endswith('parallel_run_step.txt'):
result_file = os.path.join(root,file)
df = pd.read_csv(result_file, delimiter=":", header=None)
df.columns = ["File", "Prediction"]
print(df)
Correct
This code will find the parallel_run_step.txt file.
21.
Question 21
What code should you write using SDK if your goal is to extract the best run and its model?
0 / 1 point
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run_get_metrics(1)
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)
best_run, fitted_model = automl_run.get_input()

best_run_metrics = best_run.get_metrics()
best_run, fitted_model = automl_run.get_output()
best_run, fitted_model = automl.run.get_output()
Incorrect
Try going back and reviewing Running automated machine learning experiments.
22.
Question 22
What code should you write for a PFIExplainer if you have a model entitled loan_model?
0 / 1 point
from interpret.ext.blackbox

classes=['loan_amount','income','age','marital_status'],
features=['reject', 'approve'])
explainable_model= DecisionTreeExplainableModel,
Incorrect
Try going back and reviewing Using explainers.
23.
Question 23
If you want to minimize disparity in combined true positive rate and false_positive_rate across
sensitive feature groups, what is the most suitable parity constraint that you should choose to use
with any of the mitigation algorithms?
0 / 1 point
True positive rate parity
Error rate parity
False-positive rate parity
Equalized odds
Incorrect
Try going back and reviewing Mitigate unfairness with Fairlearn.
24.
Question 24
You decided to preprocess and filter down only the relevant columns for your AirBnB housing
dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function
that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction("George")
Your goal is to use the spark.sql.register in order to create a UDF from the function above, because
you want to ensure that the UDF will be created in the SQL namespace.
0 / 1 point
airbnbDF.createTempView("airbnbDF")
spark.udf.register(sql_udf = firstInitialFunction)
airbnbDF.createAndReplaceTempView("airbnbDF")
spark.udf.register(sql_udf.firstInitialFunction)
airbnbDF.replaceTempView("airbnbDF")
airbnbDF.createOrReplaceTempView("airbnbDF")
Incorrect
Try going back and reviewing Work with user-defined functions.
25.
Question 25
You decided to use the AirBnB Housing dataset and the Linear Regression algorithm for which you
want to tune the Hyperparameters.
At this point, for the Boston data set you have executed a test split and for the linear regression you
have built a pipeline.
You now want to test the maximum number of iterations by using the ParamGridBuilder() and you
can do this no matter if you want to use an intercept with the y axis or fi you want to standardize the
features.
0 / 1 point
from pyspark.ml.tuning import ParamGridBuilder
paramGrid = (ParamGridBuilder(lr)
.addGrid(lr.maxIter, [1, 10, 100])

.addGrid(lr.fitIntercept, [True, False])
.addGrid(lr.standardization, [True, False])
.run()
paramGrid = (ParamGridBuilder(lr)
.create()
paramGrid = (ParamGridBuilder()
.build()
paramGrid = (ParamGridBuilder()
.search()
Incorrect
Try going back and reviewing Perform model selection with hyperparameter tuning.
26.
Question 26
You decided to use Python code interactively in your Conda environment. You have all the required
Azure Machine Learning SDK and MLflow packages in the environment.
In order to log metrics in your Azure Machine Learning experiment entitled mlflow-experiment, you
have to use MLflow.
import mlflow
#1 Set the MLflow logging target
#2 Configure the experiment
with #3 Begin the experiment run
#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)
print(“Finished!”)
0 / 1 point
#1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run('mlflow-experiment), #3
mlflow.start_run(), #4 run.log()
#1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3

mlflow.active_run(), #4 mlflow.log_metric
#1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment('mlflow-
experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
#1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run('mlflow-experiment), #3
mlflow.start_run(), #4 mlflow.log_metric
Incorrect
Try going back and reviewing Use MLflow to track experiments, log metrics, and compare runs.
27.
Question 27
You want to deploy in your Azure Container Instance a deep learning model.
In order to call the model API, you have to use the Azure Machine Learning SDK.
To invoke the deployed model, you have to use native SDK classes and methods.
#1st code option
Import json
service_name = “mlmodel1-service”
service = Webservice(name=service_name, workspace=ws)

x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]
input_json = json.dumps({“data”: x_new})
#2nd code option
1 / 1 point
from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)
from azureml.core.webservice import requests, predictions = service.run(input_json)
from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)
from azureml.core.webservice import Webservice, predictions = service.run(input_json)
Correct
These are the correct commands for this task.
28.
Question 28
One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
In order for the machine learning model to know how to handle the room types, you have to firstly
encode every unique string into a number.
What code should you write to achieve this goal?
0 / 1 point
indexerModel = indexer.transform(uniqueTypesDF)
display(indexedDF)
display(indexedDF)
indexer = StringIndexer(inputCol="room_type”)
display(indexedDF)
from pyspark.ml.feature import Indexer
display(indexedDF)
Incorrect
Try going back and reviewing Perform featurization of the dataset.
29.
Question 29
You are able to use the the MlflowClient object as the pathway in order to query previous runs in a
programmatic manner.
1 / 1 point
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.list_experiments()
from mlflow.tracking import MlflowClient
list.client_experiments()
from mlflow.pipelines import MlflowClient
client.list_experiments()
from mlflow.pipelines import MlflowClient
list.experiments()
Correct
This is the correct code syntax for this job.
30.
Question 30
In you want to explore the hyperparameters on a model while knowing that every algorithm uses a
different hyperparameter for tuning, what is the most appropriate method you should choose?
0 / 1 point
exploreParams()
explainParams()
showParams()
getParams()
Incorrect
Try going back and reviewing Describe model selection and hyperparameter tuning.
31.
Question 31
Your task is to clean up the deployments and terminate the “dev” ACI webservice by making use of
the Azure ML SDK after your work with Azure Machine Learning has ended.
What is the most suitable method in order to achieve this goal?
0 / 1 point
dev_webservice.delete()
dev_webservice.remove()
dev_webservice.flush()
dev_webservice.terminate()
Incorrect
Try going back and reviewing Use Azure Machine Learning to deploy serving models.
32.
Question 32
The DataFrame you are currently working on contains data regarding the daily sales of ice cream. In
order to compare the avg_temp and units_sold columns you decided to use the corr method which
returned a result of 0.95.
What information can you read from this result?
0 / 1 point
Days with high avg_temp values tend to coincide with days that have high units_sold values
On the day with the maximum units_sold value, the avg_temp value was 0.95
The units_sold value is, on average, 95% of the avg_temp value
Incorrect
Try going back and reviewing Exercise - Explore data.
33.
Question 33
You can enable the Application Insights when configuring the service deployment at the moment you
want to deploy a new real-time service.
By using the SDK, what code should you write to achieve this goal?
1 / 1 point
dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1,
appinsights=True)
memory_gb = 1,
enable_app_insights=True)
memory_gb = 1,
app_insights(True))
memory_gb = 1,
app_insights=True)
Correct
This is the correct code.
34.
Question 34
You usually take the following steps when you use HorovodRunner in order to develop a distributed
training program:
1. Configure a HorovodRunner instance that is initialized with the nodes number.
2. While using the methods described in Horovod usage, define a Horovod training method for which
you want to ensure that import statements are added inside the method.
0 / 1 point
hr = HorovodRunner(tf)
def train():
import tensorflow as np
hvd.init(2)
hr.run(train)
hr = HorovodRunner()
def train():
import tensorflow as tf
hvd.init(np)
hr.run(train)
hr = HorovodRunner(np)
def train():
hvd.init()
hr.run(train)
hr = HorovodRunner(np=2)
def train():
hvd.init()
hr.run(train)
Incorrect
Try going back and reviewing Use Horovod to train a deep learning model.
35.
Question 35
You’re using the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
1 / 1 point
Create a PipelineParameter with a default value that references the location where the training data
is stored
Create a ScheduleRecurrence object with a Frequency of auto. Use the object to create a schedule
for the pipeline
Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder
containing the training data in the path_on_datastore property
Set the regenerate_outputs property of the pipeline to True
Correct
To schedule a pipeline to run whenever data changes, you must create a Schedule that monitors a
specified path on a datastore.

DP 100 Exam Ques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DP 100 Exam Ques

Uploaded by

Copyright:

Available Formats

1.

In this scenario, what are the best two metrics?

A Root Mean Square Error value that is low

An F1 score that is low

An R-Squared value close to 1

Is this solution effective?

What is the evaluation model described above?

Mean Square Error (MSE)

Root Mean Square Error (RMSE)

Mean Square Error (MSE)

Coefficient of Determination (known as R-squared or R2)

Root Mean Square Error (RMSE)

Rest minus One

One and Rest

Your goal is to create an Azure Machine Learning workspace.

This should not be selected

Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/

Navigate to Azure Machine Learning studio and create a workspace.

This should not be selected

Field Programmable Gate Array

Azure Container Instance

Azure Kubernetes Service

Machine Learning Compute

# ws, model, inference_config, and deployment_config defined previously

service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)

Your deployment does not succeed.

What code should you write in Python for this task?

azureml.core import Workspace

from azureml.core import Workspace

from azureml.core import Workspace

from azureml.core import Workspace

from azureml.core import Workspace

This should not be selected

This should not be selected

description='Model training pipeline',

description='Model training pipeline',

description='Model training pipeline',

Before publishing, a pipeline needs to have its parameters defined.

From a mix of discrete and continuous values

from azureml.train.hyperdrive import MedianStoppingPolicy

from azureml.train.hyperdrive import MedianStoppinPolicy

early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,

from azureml.train.hyperdrive import MedianStoppingPolicy

pfi_explainer = PFIExplainer(model = loan_model,

pfi_explainer = PFIExplainer(model = loan_model,

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

Remove the ethnicity feature from the training dataset.

At this moment you have to fit the Liner Regression model.

Considering this scenario, what code should you write?

from pyspark.ml.regression import LinearRegression

from pyspark.ml.regression import LinearRegression

from pyspark.ml import LinearRegression

lr = LinearRegression(featuresCol="rm ", labelCol="medv")

In order to generate the output, you have to implement a cross-validation.

from numpy import array

from sklearn.model_selection import #1st option

data – array ([10, 20, 30, 40, 50, 60])

kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)

for train, test in kFold, split( #3rd option):

print (‘train’: %s, test: %5’ % (data[train], data[test])