Quiz 1 Machine Learning II

Quiz 1 Machine Learning II Total points 15/30
Let us say that we have computed the gradient of our cost function and 1/1
stored it in a vector g. What is the cost of one gradient descent update
given the gradient?
O(D)
O(N)
O(ND)
O(ND^2)
Which statement is true about the K-Means algorithm? 1/1
All attribute values must be categorical.
The output attribute must be cateogrical.
Attribute values may be either categorical or numeric.
All attributes must be numeric
Which function does the following multi-layer perceptron realize 2/2
AND
XOR
/
NOR
NAND
Principal component analysis (PCA) 0/2
Finds the directions with the most variation in the data
Is useful for visualizing data
Dimensions are increased when applying PCA
Eigenvalues and eigenvectors are computed from the covariance matrix
The average squared difference between classifier predicted output and 1/1
actual output.
mean squared error
root mean squared error
mean absolute error
mean relative error
Name *
Atharva Gondkar
A feed-forward neural network is said to be fully connected when 1/1
all nodes are connected to each other.
all nodes at the same layer are connected to each other.
/
all nodes at one layer are connected to all nodes in the next higher layer
all nodes at one layer are connected to all nodes in the next higher layer.
all hidden layer nodes are connected to all output layer nodes.
The average positive difference between computed and desired 0/1

outcome values.
root mean squared error
mean squared error
mean absolute error
mean positive error
K means 0/2
Automatically finds the number of clusters
Each cluster center is moved to the mean of data points assigned to it for each
iteration
A too small number of clusters may lead to overfitting
The algorithm has converged when the change in cluster assignment is less than a
threshold
Roll No *
2176032
What strategies can help reduce overfitting in decision trees? 0/2
Pruning
/
Make sure each leaf node is one pure class
Make sure each leaf node is one pure class
Enforce a minimum number of samples in leaf nodes
Enforce a maximum depth for the tree
Ensemble learning 0/2
A combination of classifiers are applied for classification
Classifiers should be trained to be slightly different
In bagging, each training sample (data point) is used only once for each iteration
Minority voting is used if there is disagreement
MLP
Gradient of a continuous and differentiable function 0/2
is zero at a minimum
is non-zero at a maximum
is zero at a saddle point
decreases as you get closer to the minimum
/
During backpropagation training, the purpose of the delta rule is to make 0/1
weight adjustments so as to
minimize the number of times the training data must pass through the network.
minimize the number of times the test data must pass through the network.
minimize the sum of absolute differences between computed and actual outputs.
minimize the sum of squared error differences between computed and actual
output.
The test set accuracy of a backpropagation neural network can often be 2/2
improved by
increasing the number of epochs used to train the network.
decreasing the number of hidden layer nodes.
increasing the learning rate.
decreasing the number of hidden layers.
Email *
atharva.gondkar@gmail.com
Multilayer perceptron network 0/1
Usually, the weights are initially set to small random values
/
A hard limiting activation function is often used
A hard limiting activation function is often used
The weights can only be updated after all the training vectors have been presented
Multiple layers of neurons allow for less complex decision boundaries than a single
layer
Unsupervised learning 2/2
Categorizes training vectors by identifying similarities between them
Can use the same error functions as supervised learning
Collaborative learning methods are often applied between classes
The data applied is unlabeled
Biological neural networks 0/2
Synapses can be inhibitory or excitatory
Learning takes place in the dendrites
The outputs from a neurons are pulses of fixed strength (height) and duration
The output from the neuron is called a synapse
Feedback
The correct answer is right because x, y, z
Support Vector Machines (SVMs) * 2/2
Support vectors are used for computing hyperplanes
Is a method for minimizing the margin to hyperplanes
Nonlinear problems are handled with mapping inputs to lower-dimensional space
Kernel functions are used for transforming data

/
Logistic regression is a ________ regression technique that is used to 2/2
model data having a _____outcome.
linear, numeric
linear, binary
nonlinear, numeric
nonlinear, binary
The values input into a feed-forward neural network 1/1
may be categorical or numeric.
must be either all categorical or all numeric but not both.
must be numeric.
must be categorical.
This form was created inside of MIT University.
Forms

Quiz 1 Machine Learning II

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quiz 1 Machine Learning II

Uploaded by

Copyright:

Available Formats

Quiz 1 Machine Learning II Total points 15/30

Which statement is true about the K-Means algorithm? 1/1

All attribute values must be categorical.

The output attribute must be cateogrical.

Attribute values may be either categorical or numeric.

All attributes must be numeric

Which function does the following multi-layer perceptron realize 2/2

Principal component analysis (PCA) 0/2

Finds the directions with the most variation in the data

Is useful for visualizing data

Dimensions are increased when applying PCA

Eigenvalues and eigenvectors are computed from the covariance matrix

mean squared error

root mean squared error

mean absolute error

mean relative error

A feed-forward neural network is said to be fully connected when 1/1

all nodes are connected to each other.

all nodes at the same layer are connected to each other.

The average positive difference between computed and desired 0/1

root mean squared error

mean squared error

mean absolute error

mean positive error

Automatically finds the number of clusters

A too small number of clusters may lead to overfitting

What strategies can help reduce overfitting in decision trees? 0/2

Enforce a minimum number of samples in leaf nodes

Enforce a maximum depth for the tree

Ensemble learning 0/2

A combination of classifiers are applied for classification

Classifiers should be trained to be slightly different

Minority voting is used if there is disagreement

Gradient of a continuous and differentiable function 0/2

is zero at a saddle point

decreases as you get closer to the minimum

increasing the number of epochs used to train the network.

decreasing the number of hidden layer nodes.

increasing the learning rate.

decreasing the number of hidden layers.

Multilayer perceptron network 0/1

Usually, the weights are initially set to small random values

Unsupervised learning 2/2

Categorizes training vectors by identifying similarities between them

Can use the same error functions as supervised learning

Collaborative learning methods are often applied between classes

The data applied is unlabeled

Biological neural networks 0/2

Synapses can be inhibitory or excitatory

Learning takes place in the dendrites

The output from the neuron is called a synapse

The correct answer is right because x, y, z

Support Vector Machines (SVMs) * 2/2

Support vectors are used for computing hyperplanes

Is a method for minimizing the margin to hyperplanes

Nonlinear problems are handled with mapping inputs to lower-dimensional space

Kernel functions are used for transforming data

The values input into a feed-forward neural network 1/1

may be categorical or numeric.

must be either all categorical or all numeric but not both.

This form was created inside of MIT University.

You might also like