You are on page 1of 55

1.

Question 1
Your task is to predict if a person suffers from a disease by setting up a binary classification model.
Your solution needs to be able to detect the classification errors that may appear.

Considering the below description, which of the following would be the best error type?

“A person does not suffer from a disease. Your model classifies the case as having no disease”.

1 / 1 point

True negatives

False negatives

False positives

True positives

Correct
A true negative is an outcome where the model correctly predicts the negative class.

2.
Question 2
Your company is asking you to analyze a dataset that contains historical data obtained from a local
car-sharing company. For this task, you decide to develop a regression model and you want to be
able to foretell what price a trip will be. For the correct evaluation of the regression model, you have
to use performance metrics.

In this scenario, what are the best two metrics?

1 / 1 point

A Root Mean Square Error value that is low

Correct
RMSE and R2 are both metrics for regression models. Root mean squared error (RMSE) creates a
single value that summarizes the error in the model.
An R-Squared value close to 0

An F1 score that is low

An R-Squared value close to 1

Correct
RMSE and R2 are both metrics for regression models. Coefficient of determination, often referred to
as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the
model is random (explains nothing); 1 means there is a perfect fit.

3.
Question 3
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the
student’s length of education, degree type, and art form. You decide to set up a linear regression
model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error,
Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:

Is this solution effective?

1 / 1 point

Yes

No

Correct
Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models;
Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear
regression model.

4.
Question 4
Your task is to create and evaluate a model. You decide to use a specific metric that provides you a
direct proportionality with how well the model fits.

What is the evaluation model described above?

1 / 1 point

Mean Square Error (MSE)


Coefficient of Determination (known as R-squared or R2)

Root Mean Square Error (RMSE)

Correct
This is the evaluation metric described. In essence, this metric represents how much of the variance
between predicted and actual label values the model is able to explain.

5.
Question 5
How should the following sentence be completed?

One example of the machine learning […] type models are the Decision trees algorithms.

0 / 1 point

Classification

Clustering

Regression

Incorrect
Try going back and reviewing Train and Evaluate Regression Models.

6.
Question 6
You have a Pandas DataFrame entitled df_sales that contains the sales data from each day. You
DataFrame contains these columns: year, month, day_of_month, sales_total. Which of the following
codes should you choose if your goal is to return the average sales_total value?

0 / 1 point

df_sales['sales_total'].mean()

df_sales['sales_total'].avg()

mean(df_sales['sales_total'])
Incorrect
Try going back and reviewing Exercise - Explore data.

7.
Question 7
Choose from the list below the evaluation metric that provides you with an absolute metric in the
same unit as the label.

0 / 1 point

Mean Square Error (MSE)

Coefficient of Determination (known as R-squared or R2)

Root Mean Square Error (RMSE)

Incorrect
Try going back and reviewing Exercise - Train and evaluate a regression model.

8.
Question 8
Which are two appropriate ways to approach a problem when using multiclass classification?

1 / 1 point

Rest minus One

One vs Rest

Correct
One vs Rest (OVR), in which a classifier is created for each possible class value, with a positive
outcome for cases where the prediction is this class, and negative predictions for cases where the
prediction is any other class.

One vs One

Correct
One vs One (OVO), in which a classifier for each possible pair of classes is created.

One and Rest


9.
Question 9
In order to train your K-Means clustering model that enables grouping observations into four
clusters, you decide to use scikit-learn library. Considering this scenario, what method should you
choose to create the K-Means object?

0 / 1 point

model = KMeans(n_clusters=4)

model = Kmeans(n_init=4)

model = Kmeans(max_iter=4)

Incorrect
Try going back and reviewing Exercise - Train and evaluate a clustering model.

10.
Question 10
Which of the layer types described below is a principal one that retrieves important features in
images and works by putting a filter to images?

1 / 1 point

Convolutional layer

Pooling layer

Flattening layer

Correct
One of the principal layer types is a convolutional layer that extracts important features in images. A
convolutional layer works by applying a filter to images.

11.
Question 11
You want to set up a new Azure subscription. The subscription doesn’t contain any resources.

Your goal is to create an Azure Machine Learning workspace.


Considering this scenario, which are three possible ways to obtain this result? Keep in mind that
every correct answer presents a complete solution.

0 / 1 point

Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with
name, subscription_id, and resource_group parameters.

This should not be selected


Try going back and reviewing Introduction to the Azure Machine Learning SDK.

Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/


workspaces resource and its dependencies.

Correct
This is one way to achieve the goal.

Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the
az group create function with –name and –location parameters, and then the az ml workspace
create function, specifying Cw and Cg parameters for the workspace name and resource group.

Correct
This is one way to achieve the goal.

Navigate to Azure Machine Learning studio and create a workspace.

This should not be selected


Try going back and reviewing Introduction to the Azure Machine Learning SDK.

Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with
name, subscription_id, resource_group, and location parameters.

Correct
This is one way to achieve the goal.

12.
Question 12
You decide to use GPU-based training to develop a deep learning model on Azure Machine
Learning service that is able to recognize image.
The context where you have to configure the model needs to allow real-time GPU-based inferencing.

Considering that you have to set up compute resources for model inferencing, what is the most
suitable compute type?

0 / 1 point

Field Programmable Gate Array

Azure Container Instance

Azure Kubernetes Service

Machine Learning Compute

Incorrect
Try going back and reviewing Deploy real-time machine learning services with Azure Machine
Learning.

13.
Question 13
You decide to use the code below for the deployment of a model as an Azure Machine Learning
real-time web service:

# ws, model, inference_config, and deployment_config defined previously

service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)

service.wait_for_deployment(True)

Your deployment does not succeed.

You have to troubleshoot the deployment failure in order to determine what actions were taken while
deploying and to identify the one action that encountered a problem and didn’t succeed.

For this scenario, which of the following code snippets should you use?

0 / 1 point

service.state
service.get_logs()

service.serialize()

service.update_deployment_state()

Incorrect
Try going back and reviewing Deploy real-time machine learning services with Azure Machine
Learning.

14.
Question 14
You decide to register and train a model in your Azure Machine Learning workspace.

Your pipeline needs to ensure that the client applications are able to use the model for batch
inferencing.

Your single ParallelRunStep step pipeline uses a Python inferencing script in order to obtain
predictions from the input data.

Your task is to configure the inferencing script for the ParallelRunStep pipeline step.

Which are the most suitable two functions that you should use? Keep in mind that every correct
answer presents a part of the solution.

1 / 1 point

main()

init()

Correct
This function is called when the pipeline is initialized.

score(mini_batch)

batch()
run(mini_batch)

Correct
This function is called for each batch of data to be processed.

15.
Question 15
After installing the Azure Machine Learning Python SDK, you decide to use it to configure on your
subscription a workspace entitled “aml-workspace”.

What code should you write in Python for this task?

1 / 1 point

azureml.core import Workspace

ws = Workspace.create(name='aml-workspace',

subscription_id='123456-abc-123...',

resource_group='aml-resources',

create_resource_group=False,

location='eastus'

from azureml.core import Workspace

ws = Workspace.create(name='aml-workspace',

subscription_id='123456-abc-123...',

resource_group='aml-resources',

location='eastus'

)
from azureml.core import Workspace

ws = Workspace.create(name='aml-workspace',

subscription_id='123456-abc-123...',

resource_group='aml-resources',

create_resource_group=True,

location='eastus'

Correct
This is the correct and complete command to run for this scenario.

16.
Question 16
If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace,
what Python command would be the most appropriate?

0 / 1 point

from azureml.core import Workspace

ws = from.config_Workspace()

from azureml.core import Workspace

ws = Workspace.from.config

from azureml.core import Workspace

ws = Workspace.from_config()

Incorrect
Try going back and reviewing Azure Machine Learning tools and interfaces.

17.
Question 17
If you want to extract a dataset after its registration, what are the most suitable methods you should
choose from the Dataset class?

0 / 1 point

find_by_name

This should not be selected


Try going back and reviewing Introduction to datasets.

get_by_id

get_by_name

find_by_id

This should not be selected


Try going back and reviewing Introduction to datasets.

18.
Question 18
What are the most appropriate SDK commands you should choose if you want to publish the
pipeline that you created?

0 / 1 point

publishedpipeline = pipeline_publish(name='training_pipeline',

description='Model training pipeline',

version='1.0')

published.pipeline = pipeline.publish(name='training_pipeline',

description='Model training pipeline',

version='1.0')

published.pipeline = pipeline_publish(name='training_pipeline',
description='Model training pipeline',

version='1.0')

published_pipeline = pipeline.publish(name='training_pipeline',

description='Model training pipeline',

version='1.0')

Incorrect
Try going back and reviewing Publish pipelines.

19.
Question 19
True or False?

Before publishing, a pipeline needs to have its parameters defined.

1 / 1 point

True

False

Correct
You must define parameters for a pipeline before publishing it.

20.
Question 20
Choose from the options below the one that explains how are values for hyperparameters selected
by random sampling.

0 / 1 point

It tries to select parameter combinations that will result in improved performance from the previous
selection

From a mix of discrete and continuous values


It tries every possible combination of parameters in the search space

Incorrect
Try going back and reviewing Configuring sampling.

21.
Question 21
What Python code should you write if your goal is to implement a median stopping policy?

0 / 1 point

from azureml.train.hyperdrive import MedianStoppingPolicy

early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,

delay_evaluation=5)

from azureml.train.hyperdrive import MedianStoppinPolicy

early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,

evaluation_interval=1,

delay_evaluation=5)

from azureml.train.hyperdrive import MedianStoppingPolicy

early_termination_policy = MedianStoppingPolicy(truncation_percentage=10,

evaluation_interval=1,

delay_evaluation=5)

Incorrect
Try going back and reviewing Configuring early termination.

22.
Question 22
What code should you write for a PFIExplainer if you have a model entitled loan_model?

0 / 1 point
from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

initialization_examples=X_test,

classes=['loan_amount','income','age','marital_status'],

features=['reject', 'approve'])

from interpret.ext.blackbox

pfi_explainer = PFIExplainer(model = loan_model,

initialization_examples=X_test,

features=['loan_amount','income','age','marital_status'],

classes=['reject', 'approve'])

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

features=['loan_amount','income','age','marital_status'],

classes=['reject', 'approve'])

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

explainable_model= DecisionTreeExplainableModel,

features=['loan_amount','income','age','marital_status'],

classes=['reject', 'approve'])

Incorrect
Try going back and reviewing Using explainers.

23.
Question 23
Your task is to train a binary classification model in order for it to be able to target the correct
subjects in a marketing campaign.

What actions should you take if you want to ensure that your model is fair and will not be inclined to
ethnic discrimination?

1 / 1 point

Evaluate each trained model with a validation dataset, and use the model with the highest accuracy
score. An accurate model is inherently fair.

Remove the ethnicity feature from the training dataset.

Compare disparity between selection rates and performance metrics across ethnicities.

Correct
By using ethnicity as a sensitive field, and comparing disparity between selection rates and
performance metrics for each ethnicity value, you can evaluate the fairness of the model.

24.
Question 24
You decided to preprocess and filter down only the relevant columns for your AirBnB housing
dataframe.

The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.

In order to obtain the first initial from the host_name column, you have written the following function
that you entitled firstInitialFunction:

def firstInitialFunction(name):

return name[0]

firstInitialFunction("George")

Your goal is to use the spark.sql.register in order to create a UDF from the function above, because
you want to ensure that the UDF will be created in the SQL namespace.
Considering this scenario, what code should you write?

0 / 1 point

airbnbDF.createAndReplaceTempView("airbnbDF")

spark.udf.register(sql_udf.firstInitialFunction)

airbnbDF.replaceTempView("airbnbDF")

spark.udf.register("sql_udf", firstInitialFunction)

airbnbDF.createTempView("airbnbDF")

spark.udf.register(sql_udf = firstInitialFunction)

airbnbDF.createOrReplaceTempView("airbnbDF")

spark.udf.register("sql_udf", firstInitialFunction)

Incorrect
Try going back and reviewing Work with user-defined functions.

25.
Question 25
You discover a median value for a number of variables in your AirBnB Housing dataset, variables
like the number of rooms, per capita crime and economic status of residents.

Depending on the average number of rooms, you want to be able to predict the median home value
by using Linear Regression.

You decided to use VectorAssembler to import the dataset and to create your column entitled
features that includes a single input variable entitled rm.

At this moment you have to fit the Liner Regression model.

Considering this scenario, what code should you write?

0 / 1 point
from pyspark import LinearRegression

lr = LinearRegression(featuresCol="features", labelCol="medv")

lrModel = lr.fit(bostonFeaturizedDF)

from pyspark.ml.regression import LinearRegression

lr = LinearRegression(featuresCol="rm", labelCol="medv")

lrModel = lr_fit(bostonFeaturizedDF)

from pyspark.ml.regression import LinearRegression

lr = LinearRegression(featuresCol="features", labelCol="medv")

lrModel = lr.fit(bostonFeaturizedDF)

from pyspark.ml import LinearRegression

lr = LinearRegression(featuresCol="rm ", labelCol="medv")

lrModel = lr_fit(bostonFeaturizedDF)

Incorrect
Try going back and reviewing Train a machine learning model.

26.
Question 26
You want to evaluate a Python NumPy array that has six data points with the following definition:
data = [10, 20, 30, 40, 50, 60]

Your task is to use the k-fold algorithm implementation in the Python Scikit-learn machine learning
library to generate the output that follows: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test:
[10 50] train: [10 20 30 50], test: [40 60]

In order to generate the output, you have to implement a cross-validation.

To give the correct answer, you have to replace the code comments that are bolded with some
suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?

from numpy import array

from sklearn.model_selection import #1st option

data – array ([10, 20, 30, 40, 50, 60])

kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)

for train, test in kFold, split( #3rd option):

print (‘train’: %s, test: %5’ % (data[train], data[test])

0 / 1 point

K-means, 6, array

K-fold, 3, array

CrossValidation, 3, data

K-fold, 3, data

Incorrect
Try going back and reviewing Perform model selection with hyperparameter tuning.

27.
Question 27
For your experiment in Azure Machine Learning you decide to run the following code:

from azureml.core import Workspace, Experiment, Run

from azureml.core import RunConfig, ScriptRunConfig

ws = Workspace.from_config()

run_config = RunConfiguration()

run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’, script=’experiment.py’,
run_config=run_config)

experiment = Experiment(workspace=ws, name=’script experiment’)

run = experiment.submit(config=script_config)

run.wait_for_completion()

The experiment run generates several output files that need identification.

In order to retrieve the output file names, you must write some code. Which of the following code
snippets should you choose to complete the script?

0 / 1 point

files = run.get_metrics()

files = run.get_properties()

files = run.get_fine_names()

files = run.get_details_with_logs()

Incorrect
Try going back and reviewing Work with Azure Machine Learning to deploy serving models.

28.
Question 28
One of the categorical variables of your AirBnB dataset is room type.

You have three room types, as follows: private room, entire home/apt, and shared room.

In order for the machine learning model to know how to handle the room types, you have to firstly
encode every unique string into a number.

What code should you write to achieve this goal?

0 / 1 point
from pyspark.ml.feature import StringIndexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")

indexerModel = indexer.transform(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

from pyspark.ml.feature import Indexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")

indexerModel = indexer.fit(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

from pyspark.ml.feature import StringIndexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type”)

indexerModel = indexer.fit(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

from pyspark.ml.feature import StringIndexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()
indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")

indexerModel = indexer.fit(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

Incorrect
Try going back and reviewing Perform featurization of the dataset.

29.
Question 29
Your task is to extract from the experiments list the last run.

What code should you write in Python to achieve this?

0 / 1 point

runs = client.search_runs(experiment_id, order_by=["attributes.start_time desc"], max_results=3)

runs[0].data.metrics

runs = client.search_runs(experiment_id, order_by=["attributes.start_time desc"], max_results=1)

runs[0].data.metrics

runs = client.search_runs(experiment_id, order_by=["attributes.start_time asce"], max_results=1)

runs[0].data.metrics

runs = client.search_runs(experiment_id, order_by=["attributes.start_time"], max_results=1)

runs[0].data.metrics

Incorrect
Try going back and reviewing Use MLflow to track experiments, log metrics, and compare runs.

30.
Question 30
Choose from the list below the cross-validation technique that belongs to the exhaustive type.
0 / 1 point

K-fold cross-validation

This should not be selected


Try going back and reviewing Describe model selection and hyperparameter tuning.

Leave-one-out cross-validation

Leave-p-out cross-validation

Correct
Leave-p-out cross-validation (LpO CV) is an exhaustive type of cross-validation technique. It
involves using p observations as the validation set and the remaining observations as the training
set. This is repeated on all ways to cut the original sample on a validation set of p observations and
a training set.

Holdout cross-validation

31.
Question 31
You decided to use Azure Machine Learning and your goal is to train a Diabetes Model and build a
container image for it.

You choose to make use of the scikit-learn ElasticNet linear regression model.

You want to use Azure Kubernetes Service (AKS) for the model deployment to production.

You have to create an active AKS cluster by using the Azure ML SDK.

You decide to use the standard configuration.

What code should you write for this task?

1 / 1 point

aks_target = ComputeTarget.workspace = workspace

(name = aks_cluster_name,
provisioning_configuration = prov_config)

aks_target = ComputeTarget.create(workspace = workspace,

name = aks_cluster_name,

provisioning_configuration = prov_config)

aks_target = ComputeTarget.deploy(workspace = workspace,

name = aks_cluster_name,

provisioning_configuration = prov_config)

aks_target = ComputeTarget.create(workspace = workspace,

name = aks_cluster_name,)

Correct
This is the correct code for this task.

32.
Question 32
If you want to list the generated files after your experiment run is completed, what is the most
suitable object run you should choose?

0 / 1 point

list_file_names

download_files

download_file

get_file_names

Incorrect
Try going back and reviewing Registering models.

33.
Question 33
Your hyperparameter tuning needs to have a search space defined. The values of the batch_size
hyperparameter can be 128, 256, or 512 and the normal distribution values for the learning_rate
hyperparameter can have a mean of 10 and a standard deviation of 3.

What Python code should you write in order to achieve this goal?

0 / 1 point

from azureml.train.hyperdrive import choice, normal

param_space = {

'--batch_size': choice(128, 256, 512),

'--learning_rate': lognormal(10, 3)

from azureml.train.hyperdrive import choice, normal

param_space = {

'--batch_size': choice(128, 256, 512),

'--learning_rate': qnormal(10, 3)

from azureml.train.hyperdrive import choice, normal

param_space = {

'--batch_size': choice(128, 256, 512),

'--learning_rate': normal(10, 3)

}
from azureml.train.hyperdrive import choice, uniform

param_space = {

'--batch_size': choice(128, 256, 512),

'--learning_rate': uniform(10, 3)

Incorrect
Try going back and reviewing Defining a search space.

34.
Question 34
You intend to use the Hyperdrive feature of Azure Machine Learning to determine the optimal
hyperparameter values when training a model.

You need to use Hyperdrive to try combinations of the following hyperparameter values:

-- learning_rate: any value between 0.001 and 0.1

-- batch_size: 16, 32, or 64

You must configure the search space for the Hyperdrive experiment.

Which two parameter expressions should you use? Each correct answer presents part of the
solution.

0 / 1 point

A choice expression for learning_rate

This should not be selected


Try going back and reviewing Deploy batch inference pipelines and tune hyperparameters with
Azure Machine Learning.

A choice expression for batch_size

Correct
Discrete hyperparameters are specified as a choice among discrete values. choice can be: one or
more comma-separated values -- a range object -- any arbitrary list object.

A uniform expression for learning_rate

Correct
Continuous hyperparameters are specified as a distribution over a continuous range of values.
Supported distributions include:

-- uniform(low, high) - Returns a value uniformly distributed between low and high.

A normal expression for batch_size

35.
Question 35
You are evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric.

Which visualization should you use?

0 / 1 point

Box plot

A violin plot

Binary classification confusion matrix

Gradient descent

Incorrect
Try going back and reviewing Create a classification model with Azure AI.

1.
Question 1
Your task is to predict if a person suffers from a disease by setting up a binary classification model.
Your solution needs to be able to detect the classification errors that may appear.

Considering the below description, which of the following would be the best error type?

“A person does not suffer from a disease. Your model classifies the case as having a disease”.

1 / 1 point

True positives

False positives

False negatives

True negatives

Correct
A false positive is an outcome where the model incorrectly predicts the positive class.

2.
Question 2
As a senior data scientist, you need to evaluate a binary classification machine learning model.

As evaluation metric, you have to use the precision. Considering this, which is the most appropriate
visualization?

0 / 1 point

Receiver Operating Characteristic (ROC) curve

Violin plot

Scatter plot

Gradient descent

Incorrect
Try going back to Train and evaluate Classification models.

3.
Question 3
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the
student’s length of education, degree type, and art form. You decide to set up a linear regression
model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error,
Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:

Is this solution effective?

1 / 1 point

Yes

No

Correct
Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models;
Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear
regression model.

4.
Question 4
Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the
same unit as the label.

What is the metric described above?

1 / 1 point

Root Mean Square Error (RMSE)

Coefficient of Determination (known as R-squared or R2)

Mean Square Error (MSE)

Correct
This is the described metric. This means that the smaller the value, the better the model.

5.
Question 5
Python is commonly known to ensure extensive functionality with powerful and statistical numerical
libraries. What are the utilities of TensorFlow?

0 / 1 point

Providing attractive data visualizations

Analyzing and manipulating data

Offering simple and effective predictive data analysis

Supplying machine learning and deep learning capabilities

Incorrect
Try going back and reviewing Explore & Analyse Data with Python.

6.
Question 6
If you multiply by 2 a list and a NumPy array, what result would you get?

1 / 1 point

Multiplying a list by 2 creates a new list 2 times the length with the original sequence repeated 2
times.

Correct
This is how a list behaves when multiplied.

Multiplying a NumPy array by 2 performs an element-wise calculation on the array, which sees the
array stay the same size, but each element has been multiplied by 2.

Correct
This is how a NumPy array behaves when multiplied.

Multiplying an NumPy array by 2 creates a new array 2 times the length with the original sequence
repeated 2 times.
Multiplying a list by 2 performs an element-wise calculation on the list, which sees the list stay the
same size, but each element has been multiplied by 2.

7.
Question 7
Choose from the list below the evaluation metric that provides you with an absolute metric in the
same unit as the label.

1 / 1 point

Mean Square Error (MSE)

Root Mean Square Error (RMSE)

Coefficient of Determination (known as R-squared or R2)

Correct
This is the described metric. This means that the smaller the value, the better the model.

8.
Question 8
Four possible prediction outcomes are able to provide you with the Precision and Recall metrics.

What is the outcome in the scenario where the predicted label is 1, but the actual label is 0?

0 / 1 point

False Positive

True Negative

True Positive

False Negative

Incorrect
Try going back and reviewing Exercise - Train and evaluate a classification model.

9.
Question 9
Your deep neural network is in the process of training. You decided to set 30 epochs to the training
process configuration.

In this scenario, what would happen to the model’s behavior?

0 / 1 point

The first 30 rows of data are used to train the model, and the remaining rows are used to validate it

The entire training dataset is passed through the network 30 times

The training data is split into 30 subsets, and each subset is passed through the network

Incorrect
Try going back and reviewing Train a deep neural network.

10.
Question 10
Which of the layer types described below is a principal one that retrieves important features in
images and works by putting a filter to images?

1 / 1 point

Convolutional layer

Flattening layer

Pooling layer

Correct
One of the principal layer types is a convolutional layer that extracts important features in images. A
convolutional layer works by applying a filter to images.

11.
Question 11
You are using an Azure Machine Learning service for your data science project. In order to deploy
the project, you have to choose a compute target. For this scenario, which of the following Azure
services is the most suitable?

0 / 1 point
Azure Databricks

Azure Data Lake Analytics

Apache Spark for HDInsight

Azure Container Instances

Incorrect
Try going back and reviewing Work with Compute in Azure Machine Learning.

12.
Question 12
You have a set of CSV files that contain sales records. Your CSV files follow an identical data
schema.

The sales record for a certain month are held in one of the CSV files and the filename is sales.csv.
For every file there is a corresponding storage folder that shows the month and the year for the data
recording. In an Azure Machine Learning workspace has been set up a datastore for the folders kept
in an Azure blob container. The parent folder entitled sales contains the folders organized to create
the hierarchical structure below:

/sales

/01-2019

/sales.csv

/02-2019

/sales.csv

/03-2019

/sales.csv


In the sales folder is added a new folder with a certain month’s sales every time that month has
ended. You want to train a machine learning model by using the sales data while complying with the
requirements below:

- All of your sales data have to be loaded to date by a dataset and into a structure that enables easy
conversion to a dataframe.

- You have to ensure that experiments can be done by using only the data created until a specific
previous month, disregarding any data added after the month selected.

- You have to keep the number of registered datasets to the minimum possible.

Considering that the sales data have to be registered as a dataset in the Azure Machine Learning
service workspace, what actions should you take?

1 / 1 point

Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month,
replacing the existing dataset and specifying a tag named month indicating the month and year it
was registered. Use this dataset for all experiments.

Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version
and with a tag named month indicating the month and year it was registered. Use this dataset for all
experiments, identifying the version to be used based on the month tag as necessary.

Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-
yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each
month with appropriate MM and YYYY values for the month and year. Use the appropriate month-
specific dataset for experiments.

Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv',
register the dataset with the name sales_dataset and a tag named month indicating the month and
year it was registered, and use this dataset for all experiments.

Correct
This is the correct approach to this scenario.
13.
Question 13
You decide to use Azure Machine Learning designer for your real-time service endpoint. You can
make use of only one Azure Machine Learning service compute resource.

You start training the model and preparing the real-time pipeline for deployment.

If you want to obtain a web service by publishing the inference pipeline, what is the most suitable
compute type?

0 / 1 point

HDInsight

Azure Kubernetes Services

a new Machine Learning Compute resource

Azure Databricks

the existing Machine Learning Compute resource

Incorrect
Try going back and reviewing Deploy real-time machine learning services with Azure Machine
Learning.

14.
Question 14
Yes or No?

In order to explain the model’s predictions, you have to calculate the importance of all the features,
taking into account the overall global relative importance value, but also the measure of local
importance for a certain set of predictions.

You decide to obtain the global and local feature importance values that you need by using an
explainer.

Solution: Configure a PFIExplainer. Is this solution effective?


0 / 1 point

Yes

No

Incorrect
Try going back and reviewing Explain machine learning models with Azure Machine Learning.

15.
Question 15
Yes or No?

You use a logistic regression algorithm to train your classification model. In order to explain the
model’s predictions, you have to calculate the importance of all the features, taking into account the
overall global relative importance value, but also the measure of local importance for a certain set of
predictions.

You decide to obtain the global and local feature importance values that you need by using an
explainer.

Solution: Configure a TabularExplainer. Is this solution effective?

0 / 1 point

Yes

No

Incorrect
Try going back and reviewing Explain machine learning models with Azure Machine Learning.

16.
Question 16
If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace,
what Python command would be the most appropriate?

0 / 1 point

from azureml.core import Workspace

ws = from.config_Workspace()
from azureml.core import Workspace

ws = Workspace.from_config()

from azureml.core import Workspace

ws = Workspace.from.config

Incorrect
Try going back and reviewing Azure Machine Learning tools and interfaces.

17.
Question 17
If you want to use the from_delimited_files method of the Dataset.Tabular class to configure and
register a tabular dataset, what are the most appropriate Python commands?

0 / 1 point

from azureml.core import Dataset

blob_ds = ws.get_default_datastore()

csv_paths = [(blob_ds, 'data/files/current_data.csv'),

(blob_ds, 'data/files/archive/*.csv')]

tab_ds = Dataset.Tabular.from_delimited_files()

tab_ds = tab_ds.register(workspace=ws, name='csv_table')

from azureml.core import Dataset

blob_ds = ws.get_default_datastore()

csv_paths = [(blob_ds, 'data/files/current_data.csv'),

(blob_ds, 'data/files/archive/*.csv')]

tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
tab_ds = tab_ds.register(workspace=ws, name='csv_table')

from azureml.core import Dataset

blob_ds = ws.get_default_datastore()

csv_paths = [(blob_ds, 'data/files/current_data.csv'),

(blob_ds, 'data/files/archive/csv')]

tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)

tab_ds = tab_ds.register(workspace=ws, name='csv_table')

from azureml.core import Dataset

blob_ds = ws.change_default_datastore()

csv_paths = [(blob_ds, 'data/files/current_data.csv'),

(blob_ds, 'data/files/archive/*.csv')]

tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)

tab_ds = tab_ds.register(workspace=ws, name='csv_table')

Incorrect
Try going back and reviewing Introduction to datasets.

18.
Question 18
Your task is to use the SDK in order to define a compute configuration for a managed compute
target.

Which of the following commands will return you the expected result?

0 / 1 point

compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',

min_nodes=0, max_nodes=4,
vm_priority='dedicated')

compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',

min_nodes=0, max_nodes=0,

vm_priority='dedicated')

compute_config = AmlCompute.provisioning.configuration(vm_size='STANDARD_DS11_V2',

min_nodes=0, max_nodes=4,

vm_priority='dedicated')

compute_config = AmlCompute_provisioning_configuration(vm_size='STANDARD_DS11_V2',

min_nodes=0, max_nodes=4,

vm_priority='dedicated')

Incorrect
Try going back and reviewing Create compute targets.

19.
Question 19
Your task is to deploy your service on an AKS cluster that is set up as a compute target.

What SDK commands are able to return you the expected result?

0 / 1 point

from azureml.core.webservice import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'

compute_config = AksCompute.provisioning_configuration(location='eastus')

production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

production_cluster.wait_for_completion(show_output=True)
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'

compute_config = AksCompute.provisioning_configuration(location='eastus')

production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

production_cluster.wait_for_completion(show_output=True)

from azureml.core.webservice import ComputeTarget, AksWebservice

cluster_name = 'aks-cluster'

compute_config = AksCompute.provisioning_configuration(location='eastus')

production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

production_cluster.wait_for_completion(show_output=True)

from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'

compute_config = AksCompute.provisioning_configuration(location='eastus')

production_cluster = ComputeTarget.deploy (ws, cluster_name, compute_config)

production_cluster.wait_for_completion(show_output=True)

Incorrect
Try going back and reviewing Deploy a model as a real-time service.

20.
Question 20
If you want to extract the parallel_run_step.txt file from the output of the step after the pipeline run
has ended, what code should you choose?

1 / 1 point
prediction_run = next(pipeline_run.get_children())

prediction_output = prediction_run.get_output_data('inferences')

prediction_output.download(local_path='results')

for root, dirs, files in os.walk('results'):

for file in files:

if file.endswith('parallel_run_step.txt'):

result_file = os.path.join(root,file)

df = pd.read_csv(result_file, delimiter=":", header=None)

df.columns = ["File", "Prediction"]

print(df)

Correct
This code will find the parallel_run_step.txt file.

21.
Question 21
What code should you write using SDK if your goal is to extract the best run and its model?

0 / 1 point

best_run, fitted_model = automl_run.get_output()

best_run_metrics = best_run_get_metrics(1)

for metric_name in best_run_metrics:

metric = best_run_metrics[metric_name]

print(metric_name, metric)

best_run, fitted_model = automl_run.get_input()


best_run_metrics = best_run.get_metrics()

for metric_name in best_run_metrics:

metric = best_run_metrics[metric_name]

print(metric_name, metric)

best_run, fitted_model = automl_run.get_output()

best_run_metrics = best_run.get_metrics()

for metric_name in best_run_metrics:

metric = best_run_metrics[metric_name]

print(metric_name, metric)

best_run, fitted_model = automl.run.get_output()

best_run_metrics = best_run.get_metrics()

for metric_name in best_run_metrics:

metric = best_run_metrics[metric_name]

print(metric_name, metric)

Incorrect
Try going back and reviewing Running automated machine learning experiments.

22.
Question 22
What code should you write for a PFIExplainer if you have a model entitled loan_model?

0 / 1 point

from interpret.ext.blackbox

pfi_explainer = PFIExplainer(model = loan_model,


initialization_examples=X_test,

features=['loan_amount','income','age','marital_status'],

classes=['reject', 'approve'])

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

features=['loan_amount','income','age','marital_status'],

classes=['reject', 'approve'])

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

initialization_examples=X_test,

classes=['loan_amount','income','age','marital_status'],

features=['reject', 'approve'])

from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,

explainable_model= DecisionTreeExplainableModel,

features=['loan_amount','income','age','marital_status'],

classes=['reject', 'approve'])

Incorrect
Try going back and reviewing Using explainers.

23.
Question 23
If you want to minimize disparity in combined true positive rate and false_positive_rate across
sensitive feature groups, what is the most suitable parity constraint that you should choose to use
with any of the mitigation algorithms?

0 / 1 point

True positive rate parity

Error rate parity

False-positive rate parity

Equalized odds

Incorrect
Try going back and reviewing Mitigate unfairness with Fairlearn.

24.
Question 24
You decided to preprocess and filter down only the relevant columns for your AirBnB housing
dataframe.

The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.

In order to obtain the first initial from the host_name column, you have written the following function
that you entitled firstInitialFunction:

def firstInitialFunction(name):

return name[0]

firstInitialFunction("George")

Your goal is to use the spark.sql.register in order to create a UDF from the function above, because
you want to ensure that the UDF will be created in the SQL namespace.

Considering this scenario, what code should you write?

0 / 1 point
airbnbDF.createTempView("airbnbDF")

spark.udf.register(sql_udf = firstInitialFunction)

airbnbDF.createAndReplaceTempView("airbnbDF")

spark.udf.register(sql_udf.firstInitialFunction)

airbnbDF.replaceTempView("airbnbDF")

spark.udf.register("sql_udf", firstInitialFunction)

airbnbDF.createOrReplaceTempView("airbnbDF")

spark.udf.register("sql_udf", firstInitialFunction)

Incorrect
Try going back and reviewing Work with user-defined functions.

25.
Question 25
You decided to use the AirBnB Housing dataset and the Linear Regression algorithm for which you
want to tune the Hyperparameters.

At this point, for the Boston data set you have executed a test split and for the linear regression you
have built a pipeline.

You now want to test the maximum number of iterations by using the ParamGridBuilder() and you
can do this no matter if you want to use an intercept with the y axis or fi you want to standardize the
features.

Considering this scenario, what code should you write?

0 / 1 point

from pyspark.ml.tuning import ParamGridBuilder

paramGrid = (ParamGridBuilder(lr)

.addGrid(lr.maxIter, [1, 10, 100])


.addGrid(lr.fitIntercept, [True, False])

.addGrid(lr.standardization, [True, False])

.run()

from pyspark.ml.tuning import ParamGridBuilder

paramGrid = (ParamGridBuilder(lr)

.addGrid(lr.maxIter, [1, 10, 100])

.addGrid(lr.fitIntercept, [True, False])

.addGrid(lr.standardization, [True, False])

.create()

from pyspark.ml.tuning import ParamGridBuilder

paramGrid = (ParamGridBuilder()

.addGrid(lr.maxIter, [1, 10, 100])

.addGrid(lr.fitIntercept, [True, False])

.addGrid(lr.standardization, [True, False])

.build()

from pyspark.ml.tuning import ParamGridBuilder

paramGrid = (ParamGridBuilder()
.addGrid(lr.maxIter, [1, 10, 100])

.addGrid(lr.fitIntercept, [True, False])

.addGrid(lr.standardization, [True, False])

.search()

Incorrect
Try going back and reviewing Perform model selection with hyperparameter tuning.

26.
Question 26
You decided to use Python code interactively in your Conda environment. You have all the required
Azure Machine Learning SDK and MLflow packages in the environment.

In order to log metrics in your Azure Machine Learning experiment entitled mlflow-experiment, you
have to use MLflow.

To give the correct answer, you have to replace the code comments that are bolded with some
suitable code options that you find in the answer area.

Considering this, what snippet should you choose to complete the code?

import mlflow

from azureml.core import Workspace

ws = Workspace.from_config()

#1 Set the MLflow logging target

#2 Configure the experiment

with #3 Begin the experiment run

#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)

print(“Finished!”)

0 / 1 point
#1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run('mlflow-experiment), #3
mlflow.start_run(), #4 run.log()

#1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3


mlflow.active_run(), #4 mlflow.log_metric

#1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment('mlflow-
experiment), #3 mlflow.start_run(), #4 mlflow.log_metric

#1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run('mlflow-experiment), #3
mlflow.start_run(), #4 mlflow.log_metric

Incorrect
Try going back and reviewing Use MLflow to track experiments, log metrics, and compare runs.

27.
Question 27
You want to deploy in your Azure Container Instance a deep learning model.

In order to call the model API, you have to use the Azure Machine Learning SDK.

To invoke the deployed model, you have to use native SDK classes and methods.

To give the correct answer, you have to replace the code comments that are bolded with some
suitable code options that you find in the answer area.

Considering this, what snippet should you choose to complete the code?

from azureml.core import Workspace

#1st code option

Import json

ws = Workspace.from_config()

service_name = “mlmodel1-service”

service = Webservice(name=service_name, workspace=ws)


x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]

input_json = json.dumps({“data”: x_new})

#2nd code option

1 / 1 point

from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)

from azureml.core.webservice import requests, predictions = service.run(input_json)

from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)

from azureml.core.webservice import Webservice, predictions = service.run(input_json)

Correct
These are the correct commands for this task.

28.
Question 28
One of the categorical variables of your AirBnB dataset is room type.

You have three room types, as follows: private room, entire home/apt, and shared room.

In order for the machine learning model to know how to handle the room types, you have to firstly
encode every unique string into a number.

What code should you write to achieve this goal?

0 / 1 point

from pyspark.ml.feature import StringIndexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")

indexerModel = indexer.transform(uniqueTypesDF)
indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

from pyspark.ml.feature import StringIndexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")

indexerModel = indexer.fit(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

from pyspark.ml.feature import StringIndexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type”)

indexerModel = indexer.fit(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)

from pyspark.ml.feature import Indexer

uniqueTypesDF = airbnbDF.select("room_type").distinct()

indexer = StringIndexer(inputCol="room_type", outputCol="room_type_index")

indexerModel = indexer.fit(uniqueTypesDF)

indexedDF = indexerModel.transform(uniqueTypesDF)

display(indexedDF)
Incorrect
Try going back and reviewing Perform featurization of the dataset.

29.
Question 29
You are able to use the the MlflowClient object as the pathway in order to query previous runs in a
programmatic manner.

What code should you write in Python to achieve this?

1 / 1 point

from mlflow.tracking import MlflowClient

client = MlflowClient()

client.list_experiments()

from mlflow.tracking import MlflowClient

client = MlflowClient()

list.client_experiments()

from mlflow.pipelines import MlflowClient

client = MlflowClient()

client.list_experiments()

from mlflow.pipelines import MlflowClient

client = MlflowClient()

list.experiments()

Correct
This is the correct code syntax for this job.

30.
Question 30
In you want to explore the hyperparameters on a model while knowing that every algorithm uses a
different hyperparameter for tuning, what is the most appropriate method you should choose?

0 / 1 point

exploreParams()

explainParams()

showParams()

getParams()

Incorrect
Try going back and reviewing Describe model selection and hyperparameter tuning.

31.
Question 31
Your task is to clean up the deployments and terminate the “dev” ACI webservice by making use of
the Azure ML SDK after your work with Azure Machine Learning has ended.

What is the most suitable method in order to achieve this goal?

0 / 1 point

dev_webservice.delete()

dev_webservice.remove()

dev_webservice.flush()

dev_webservice.terminate()

Incorrect
Try going back and reviewing Use Azure Machine Learning to deploy serving models.

32.
Question 32
The DataFrame you are currently working on contains data regarding the daily sales of ice cream. In
order to compare the avg_temp and units_sold columns you decided to use the corr method which
returned a result of 0.95.

What information can you read from this result?

0 / 1 point

Days with high avg_temp values tend to coincide with days that have high units_sold values

On the day with the maximum units_sold value, the avg_temp value was 0.95

The units_sold value is, on average, 95% of the avg_temp value

Incorrect
Try going back and reviewing Exercise - Explore data.

33.
Question 33
You can enable the Application Insights when configuring the service deployment at the moment you
want to deploy a new real-time service.

By using the SDK, what code should you write to achieve this goal?

1 / 1 point

dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,

memory_gb = 1,

appinsights=True)

dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,

memory_gb = 1,

enable_app_insights=True)

dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1,

app_insights(True))

dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,

memory_gb = 1,

app_insights=True)

Correct
This is the correct code.

34.
Question 34
You usually take the following steps when you use HorovodRunner in order to develop a distributed
training program:

1. Configure a HorovodRunner instance that is initialized with the nodes number.

2. While using the methods described in Horovod usage, define a Horovod training method for which
you want to ensure that import statements are added inside the method.

What code should you write in Python to achieve this?

0 / 1 point

hr = HorovodRunner(tf)

def train():

import tensorflow as np

hvd.init(2)

hr.run(train)

hr = HorovodRunner()

def train():
import tensorflow as tf

hvd.init(np)

hr.run(train)

hr = HorovodRunner(np)

def train():

import tensorflow as tf

hvd.init()

hr.run(train)

hr = HorovodRunner(np=2)

def train():

import tensorflow as tf

hvd.init()

hr.run(train)

Incorrect
Try going back and reviewing Use Horovod to train a deep learning model.

35.
Question 35
You’re using the Azure Machine Learning Python SDK to define a pipeline to train a model.

The data used to train the model is read from a folder in a datastore.

You need to ensure the pipeline runs automatically whenever the data in the folder changes.

What should you do?

1 / 1 point
Create a PipelineParameter with a default value that references the location where the training data
is stored

Create a ScheduleRecurrence object with a Frequency of auto. Use the object to create a schedule
for the pipeline

Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder
containing the training data in the path_on_datastore property

Set the regenerate_outputs property of the pipeline to True

Correct
To schedule a pipeline to run whenever data changes, you must create a Schedule that monitors a
specified path on a datastore.

You might also like