You are on page 1of 24

LECTURE 01

1. What was the primary purpose of the Turing test?

• A) To test human intelligence


• B) To define machine learning
• C) To distinguish between humans and machines
• D) To evaluate computer processing speed

2. According to the Turing test, what capabilities should a computer


possess to pass the test?

• A) Natural language processing, knowledge representation, and


automated reasoning
• B) Robotics, planning, and optimization
• C) Computer vision and machine learning
• D) All of the above

3. What is the key idea behind machine learning?

• A) Executing predefined algorithms


• B) Learning from data to make predictions and inferences
• C) Human-like decision making
• D) Performing logical reasoning

4. Which of the following is an example of a machine learning


application?

• A) Cooking recipes
• B) Weather forecasting
• C) Recommendation systems
• D) All of the above

5. What is the main focus of supervised learning?

• A) Learning without labeled data


• B) Learning from expert opinions
• C) Learning with labeled data and known outcomes
• D) Learning without training stages
6. In supervised learning, what is the purpose of the testing stage?

• A) Evaluating model performance on new data


• B) Training the algorithm with historical data
• C) Making predictions on the training set
• D) None of the above

7. Which type of supervised learning is used for predicting a continuous


outcome, such as sales predictions?

• A) Classification
• B) Clustering
• C) Regression
• D) Dimension reduction

8. What is an example use case for unsupervised learning - clustering?

• A) Spam email filtering


• B) Predicting retail sales
• C) Grouping similar news articles
• D) Credit score prediction

9. What does reinforcement learning emphasize when mapping actions?

• A) Immediate rewards only


• B) Long-term rewards only
• C) Both immediate and subsequent rewards
• D) No consideration for rewards

10. Which of the following is an example of a reinforcement learning


technique?

• A) Regression
• B) Clustering
• C) Q-learning
• D) Dimension reduction

ANSWERS

1. C) To distinguish between humans and machines


2. A) Natural language processing, knowledge representation, and
automated reasoning
3. B) Learning from data to make predictions and inferences
4. C) Recommendation systems
5. C) Learning with labeled data and known outcomes
6. A) Evaluating model performance on new data
7. C) Regression
8. C) Grouping similar news articles
9. C) Both immediate and subsequent rewards
10. C) Q-learning

LECTURE 02 :

1. What is the primary purpose of scales of measurement in data


preparation for machine learning?

• A) To summarize data
• B) To classify data into categories
• C) To understand the central tendency of data
• D) To measure data dispersion

2. In feature engineering, what does "rescaling data" aim to achieve?

• A) Eliminating outliers
• B) Introducing outliers
• C) Creating dummy variables
• D) Handling missing data

3. How are variables classified based on their type of measurement scale?

• A) By their business context


• B) By their feature construction
• C) By their range of values
• D) By their continuous or discrete nature

4. Which technique involves creating dummy variables for categorical


variables with k levels?
• A) Standardization
• B) Feature construction
• C) One-Hot Encoding
• D) Rescaling

5. What is the purpose of exploratory data analysis (EDA)?

• A) Building machine learning models


• B) Summarizing data
• C) Creating summary statistics
• D) Understanding data characteristics

6. In the context of scales of measurement, what does the nominal scale


represent?

• A) Continuous variables
• B) Ordinal variables
• C) Categorical variables
• D) Interval variables

7. What are the common practices of feature engineering?

• A) Creating summary statistics


• B) Handling missing data
• C) Performing univariate analysis
• D) Building machine learning models

8. How is normalization achieved in data preparation for machine


learning?

• A) By creating dummy variables


• B) Through min-max scaling
• C) Using predictive models
• D) Handling categorical data

9. What is the purpose of summary statistics in univariate analysis?

• A) Identifying outliers
• B) Building machine learning models
• C) Understanding data dispersion
• D) Creating exploratory data analysis
10. In dealing with missing data, what technique involves replacing
missing values with a predictive model?

•A) Delete
• B) Random replace
• C) Replace with the summary
• D) Using predictive model
ANSWERS

1. B) To classify data into categories


2. A) Eliminating outliers
3. D) By their continuous or discrete nature
4. C) One-Hot Encoding
5. D) Understanding data characteristics
6. C) Categorical variables
7. B) Handling missing data
8. B) Through min-max scaling
9. D) Creating exploratory data analysis
10. D) Using predictive model

LECTURE 3 :

What is the main focus of supervised learning – regression?

• A) Categorizing data
• B) Predicting quantitative values
• C) Exploring data patterns
• D) Handling missing values
2. In univariate regression, what does "univariate" refer to?
• A) One variable
• B) Two variables
• C) Categorical variables
• D) Nonlinear variables
3. What is the purpose of a scatter plot in regression analysis?
• A) To show causation
• B) To visualize the relationship between variables
• C) To demonstrate statistical significance
• D) To display categorical data
4. Which metric is commonly used to evaluate how well a linear model
fits the data?
• A) Mean absolute error (MAE)
• B) R-squared (R2)
• C) Root mean squared error (RMSE)
• D) Pearson correlation coefficient
5. What does R-squared measure in the context of linear regression?
• A) How close predicted values are to actual values
• B) The proportion of variation in the dependent variable explained
by the independent variable
• C) The difference between predicted and actual values
• D) The absolute average of the errors
6. Why do we square the errors in the least squares method of linear
regression?
• A) To introduce non-linearity
• B) To eliminate negative values
• C) To make the errors absolute
• D) To penalize larger errors more than smaller errors
7. What does a true zero point in ratio scale mean?
• A) The absence of the attribute being measured
• B) The presence of a meaningful zero value
• C) The existence of negative values
• D) The inclusion of categorical data
8. In polynomial regression, what does a higher-degree polynomial
introduce?
• A) More features
• B) Nonlinearity
• C) Categorical variables
• D) Multicollinearity
9. What does the Root Mean Squared Error (RMSE) measure?
• A) How well the model fits the data
• B) The proportion of explained variance
• C) How close predicted values are to actual values
• D) The absolute average of the errors
10. What is the purpose of polynomial regression of higher degrees?
• A) To introduce multicollinearity
• B) To fit curves better
• C) To simplify the model
• D) To eliminate outliers
1. B) Predicting quantitative values
2. A) One variable
3. B) To visualize the relationship between variables
4. B) R-squared (R2)
5. B) The proportion of variation in the dependent variable explained by
the independent variable
6. D) To penalize larger errors more than smaller errors
7. B) The presence of a meaningful zero value
8. B) Nonlinearity
9. C) How close predicted values are to actual values
10. B) To fit curves better

LECTURE 4

1. What is the focus of multivariate regression?


A) Categorizing data
B) Predicting quantitative values
C) Handling missing values
D) Exploring data patterns
2. How is LabelBinarizer used in handling categorical variables?
A) Converts n levels to n-1 new variables
B) Replaces category level with number representation
C) Replaces the binary variable text with numeric values
D) Converts numeric values to binary variables
3. What is the purpose of using OneHotEncoder?
A) To eliminate multicollinearity
B) To visualize the relationship between variables
C) To convert levels to numbers
D) To convert n levels to n-1 new variables
4. What is multicollinearity in regression analysis?
A) A strong relationship between dependent and independent variables
B) A strong correlation between independent variables
C) The absence of correlation in the data
D) The presence of outliers in the dataset
5. What does the Variation Inflation Factor (VIF) measure?
A) The strength of relationship between dependent and independent
variables
B) The proportion of explained variance
C) The risk of multicollinearity among independent variables
D) The accuracy of the linear regression model
6. In hypothesis testing, what does a p-value less than or equal to 0.05
signify?
A) Strong evidence against the null hypothesis – reject the null
hypothesis
B) Weak evidence against the null hypothesis – reject the null hypothesis
C) Strong evidence in favor of the null hypothesis – accept the null
hypothesis
D) Weak evidence in favor of the null hypothesis – accept the null
hypothesis
7. What does the Durbin-Watson statistic measure?
A) Multicollinearity
B) Outliers
C) Homoscedasticity
D) Autocorrelation
8. What is the purpose of the Bonferroni outlier test?
A) To identify outliers based on leverage
B) To test the normality of residuals
C) To detect heteroscedasticity in the data
D) To assess the linearity of the regression model
9. What does Ridge regularization do in linear regression?
A) Sets coefficients of variables that add minor value to zero
B) Excludes variables that add minor value to the model
C) Improves model accuracy by including all variables
D) Converts categorical variables into numerical format
10. When is LASSO regularization useful?

A) When there are many variables that add minor value individually

B) When there is no multicollinearity in the dataset

C) When the model is prone to overfitting


D) When the model requires non-linear regression fitting
1. What is the focus of multivariate regression?
Correct Answer: B) Predicting quantitative values
2. How is LabelBinarizer used in handling categorical variables?
Correct Answer: B) ) Replaces category level with number
representation
3. What is the purpose of using OneHotEncoder?
Correct Answer: D) To convert n levels to n-1 new variables
4. What is multicollinearity in regression analysis?
Correct Answer: B) A strong correlation between independent
variables
5. What does the Variation Inflation Factor (VIF) measure?
Correct Answer: C) The risk of multicollinearity among independent
variables
6. In hypothesis testing, what does a p-value less than or equal to 0.05
signify?
Correct Answer: A) Strong evidence against the null hypothesis –
reject the null hypothesis
7. What does the Durbin-Watson statistic measure?
Correct Answer: A) Multicoloniality
8. What is the purpose of the Bonferroni outlier test?
Correct Answer: A) To identify outliers based on leverage
9. What does Ridge regularization do in linear regression?
Correct Answer: A) Sets coefficients of variables that add minor
value to zero
10. When is LASSO regularization useful? A)When there are many
variables that add minor value individually

LECTURE 5 :

1. What is the main purpose of logistic regression in classification ?

• A) To predict continuous values


• B) To evaluate regression model performance
• C) To predict discrete class outcomes
• D) To visualize data distribution

Answer: C
2. In logistic regression, what function is introduced to handle
classification boundaries effectively?

• A) Exponential function
• B) Sigmoid function
• C) Hyperbolic tangent
• D) Logarithmic function

Answer: B

3. How are odds explained in logistic regression?

• A) Odds are not relevant in logistic regression


• B) Odds are calculated using linear regression
• C) Odds are represented by the logit, which is the log of odds
• D) Odds are used only in multiclass logistic regression

Answer: C

4. What is commonly used to visualize the performance of a binary


classifier?

• A) Confusion matrix
• B) Regression plot
• C) ROC Curve
• D) AUC

Answer: C

5. In the context of logistic regression, what does AUC stand for?

• A) Area Under Curve


• B) Accuracy Under Classifier
• C) Average User Confidence
• D) Analysis of Underlying Classifiers

Answer: A

6. What is the purpose of regularization in logistic regression?

• A) To reduce model complexity


• B) To increase overfitting
• C) To maximize the number of features
• D) To avoid fitting to the training data

Answer: A

7. In multiclass logistic regression, what is the Iris dataset used for?

• A) Predicting student outcomes


• B) Evaluating binary classifiers
• C) Learning multiclass prediction with three classes
• D) Constructing decision trees

Answer: C

8. What is the purpose of normalizing data in the context of logistic


regression?

• A) To make data more complex


• B) To simplify the dataset
• C) To standardize feature scales
• D) To remove outliers

Answer: C

9. What type of nodes does a decision tree consist of?

• A) Input nodes, hidden nodes, output nodes


• B) Root node, branch node, leaf node
• C) Parent node, child node, grandchild node
• D) Decision node, probability node, outcome node

Answer: B

10. How is the tree model output used in decision trees?

• A) To visualize the dataset


• B) To provide rules for classification
• C) To calculate AUC
• D) To measure regularization effectiveness
Answer: B

LECTURE 6

1. What is the primary goal of a Support Vector Machine (SVM)?

• a) Minimize the margin


• b) Maximize the margin
• c) Minimize the penalty parameter
• d) Maximize the penalty parameter

Answer: b) Maximize the margin

2. What are the key parameters for a Support Vector Machine (SVM)?

• a) Kernels and C
• b) Learning rate and epochs
• c) Depth and leaves
• d) Features and labels

Answer: a) Kernels and C

3. In K-Nearest Neighbors (K-NN), why is it recommended to choose an


odd K value for a two-class problem?

• a) It doesn't matter
• b) Avoid ties in majority voting
• c) Even K values perform better
• d) Odd K values result in overfitting

Answer: b) Avoid ties in majority voting

4. What is the purpose of differencing in the context of Autoregressive


Integrated Moving Average (ARIMA)?

• a) To introduce randomness
• b) To make the time series stationary
• c) To reduce model complexity
• d) To increase autocorrelation
Answer: b) To make the time series stationary

5. What component of time-series represents a long-term increase or


decrease?

• a) Cycle
• b) Seasonality
• c) Trend
• d) Autocorrelation

Answer: c) Trend

6. What is the primary application area of time-series forecasting


mentioned in the lecture?

• a) Image recognition
• b) Sales forecasting
• c) Sentiment analysis
• d) Natural language processing

Answer: b) Sales forecasting

7. What does the Autoregressive (AR) component of ARIMA model


involve?

• a) Forecast errors
• b) Lagged values of the variable
• c) Random error
• d) Moving averages

Answer: b) Lagged values of the variable

8. In ARIMA, what does the parameter 'p' represent?

• a) Order of differencing
• b) Number of unknown terms in autoregressive part
• c) Order of the moving average part
• d) Number of classes

Answer: b) Number of unknown terms in autoregressive part


9. What metric can be used to evaluate the deviation between actual and
predicted values in a time series?

• a) F1-score
• b) Precision
• c) Akaike Information Criterion (AIC)
• d) Mean Absolute Error (MAE)

Answer: d) Mean Absolute Error (MAE)

10. What is the purpose of the Dickey Fuller test in building an ARIMA
model?

• a) Assess the stationarity of the series


• b) Select optimal parameters
• c) Evaluate model performance
• d) Test for multicollinearity

Answer: a) Assess the stationarity of the series

LECTURE 7

What is the main objective of the K-means algorithm?

A. Maximize intra-cluster similarity and minimize inter-cluster similarity.

B. Maximize inter-cluster similarity and minimize intra-cluster similarity.

C. Maximize both intra-cluster and inter-cluster similarity.

D. Minimize both intra-cluster and inter-cluster similarity.

Answer 1: A. Maximize intra-cluster similarity and minimize inter-cluster


similarity.

Question 2: Which algorithm consists of Expectation and Maximization steps?

A. K-means

B. Hierarchical clustering
C. Principal Component Analysis (PCA)

D. Expectation Maximization (EM)

Answer 2: D. Expectation Maximization (EM)

Question 3: What is the key parameter to determine the number of clusters in


K-means?

A. Centroid

B. Silhouette method

C. Elbow method

D. Linkage method

Answer 3: C. Elbow method

Question 4: What similarity measure is commonly used in clustering


algorithms such as k-means?

A. Pearson correlation

B. Euclidean Distance

C. Manhattan Distance

D. Cosine Similarity

Answer 4: B. Euclidean Distance

Question 5: What does PCA stand for?

A. Principal Cluster Analysis

B. Principal Component Algorithm

C. Principal Component Analysis

D. Predictive Clustering Approach


Answer 5: C. Principal Component Analysis

Question 6: What is the first step in the PCA approach?

A. Perform eigen decomposition

B. Generate covariance matrix

C. Standardize data

D. Sort eigen pairs

Answer 6: C. Standardize data

Question 7: Which method is commonly used for hierarchical clustering


linkage?

A. Euclidean Linkage

B. Ward Linkage

C. K-means Linkage

D. Silhouette Linkage

Answer 7: B. Ward Linkage

Question 8: What is the limitation of the K-means method?

A. It does not require the number of clusters to be specified.

B. It works well with clusters of differing sizes and shapes.

C. It is not affected by the presence of outliers.

D. It requires the number of clusters to be specified and is sensitive to differing


cluster sizes and shapes.

Answer 8: D. It requires the number of clusters to be specified and is sensitive


to differing cluster sizes and shapes.
Question 9: Which step in the Expectation Maximization algorithm is
responsible for finding the expected point associated with a cluster?

A. Step 1 – Expectation

B. Step 2 – Maximization

C. Centroid Calculation

D. Silhouette Calculation

Answer 9: A. Step 1 – Expectation

Question 10: What is the purpose of hierarchical clustering?

A. Maximize intra-cluster similarity

B. Minimize inter-cluster similarity

C. Visualize clusters with a dendrogram

D. Both B and C

Answer 10: D. Both B and C

LECTURE 8
1. What is the purpose of finding the optimal probability cut-off point
in binary classification?
• A) Minimize overall accuracy
• B) Maximize false positive rate
• C) Balance true positive rate and false positive rate
• D) Increase the total number of predictions
Answer: C
2. In an imbalanced dataset, what is the common problem when
providing equal samples of positive and negative instances to a
classification algorithm?
• A) Overfitting
• B) Underfitting
• C) Bias
• D) Optimal performance
Answer: C
3. Which resampling technique may lead to overfitting issues due to
multiple related instances?
• A) Random under-sampling
• B) Random over-sampling
• C) Synthetic Minority Over-Sampling Technique (SMOTE)
• D) No resampling technique leads to overfitting
Answer: B
4. What is the primary reason for bias in a model?
• A) Including the right features
• B) Including too many features
• C) Not including the right features
• D) Choosing high regularization parameters
Answer: C
5. What is the key reason for overfitting in a model?
• A) Using higher-order polynomial degrees
• B) Including too few features
• C) Reducing the model complexity
• D) Providing more data points
Answer: A
6. In K-fold cross-validation, how are the data points distributed
between training and testing sets in each iteration?
• A) Randomly assigned
• B) All data points are used for training
• C) All data points are used for testing
• D) K-1 folds for training, 1 fold for testing
Answer: D
7. What is the primary purpose of bagging in ensemble methods?
• A) Increase bias
• B) Reduce variance
• C) Enhance model complexity
• D) Eliminate underfitting
Answer: B
8. Which parameter defines the number of trees in a bagging
ensemble?
• A) max_features
• B) n_estimators
• C) n_jobs
• D) random_state
Answer: B
9. What is the key advantage of Extremely Randomized Trees
(ExtraTree) in comparison to regular decision trees?
• A) Lower variance
• B) Higher bias
• C) Higher interpretability
• D) Lower computational cost
Answer: A
10. Which parameter is responsible for controlling the subset of
features used for splitting nodes in bagging?
• A) n_estimators
• B) max_features
• C) n_jobs
• D) random_state
Answer: B

LECTURE 9
1. What are the three steps involved in the AdaBoosting process?
a. Data preprocessing, feature extraction, classification
b. Assigning weights, updating weights, majority voting
c. Gradient boosting, tree pruning, model fitting
d. Cross-validation, grid search, hyperparameter tuning
Answer: b
2. How is the weighted error rate of the weak classifier calculated in
AdaBoost?
a. Summing misclassification weights
b. Absolute difference of predicted and true labels
c. Mean squared error
d. Entropy calculation
Answer: a
3. What is the purpose of the majority voting step in AdaBoost?
a. Assigning weights to data points
b. Updating classifier weights
c. Selecting the best weak classifier
d. Determining the final predicted class
Answer: d
4. What is a key advantage of the XGBoost algorithm?
a. Limited support for parallel processing
b. Inability to handle missing values
c. Early stopping to prevent overfitting
d. Restricted maximum depth of trees
Answer: c
5. Which parameter controls the learning rate in the XGBoost algorithm?
a. max_depth
b. min_child_weight
c. subsample
d. eta
Answer: d
6. What are essential tuning parameters for boosting algorithms related to
tree structure?
a. Number of estimators, learning rate, max depth
b. Subsample, regularization term, colsample_bytree
c. Objective, eval_metric, learning rate
d. nthread, booster, max_depth
Answer: a
7. What is stacking in the context of ensemble methods?
a. Majority voting on predictions
b. Combining different models at the final level
c. Aggregating weak learners iteratively
d. Parallel processing of models
Answer: b
8. What is the difference between hard voting and soft voting in ensemble
voting?
a. Hard voting involves weighted probabilities, while soft voting doesn't.
b. Soft voting considers only class labels, while hard voting considers
probabilities.
c. Hard voting uses the argmax of predicted probabilities, while soft voting
involves majority voting.
d. There is no difference; the terms are used interchangeably.
Answer: c
9. What is a disadvantage of the GridSearchCV method for hyperparameter
tuning?
a. It requires less computational resources.
b. It always finds the global optimal parameters.
c. It is not computationally expensive.
d. It may not explore the entire hyperparameter space efficiently.
Answer: d
10. How does the RandomSearchCV method differ from the GridSearchCV
method?
a. RandomSearchCV is more computationally expensive.
b. RandomSearchCV requires fixed parameter values.
c. GridSearchCV uses a range for numerical parameters.
d. RandomSearchCV explores the entire hyperparameter space exhaustively.
Answer: c

LECTURE 10

1. What is the primary challenge in image classification that a simple


classification model might struggle with?
• A) Lack of computational power
• B) Feature engineering complexity
• C) High-resolution images
• D) Overfitting issues
Answer: B
2. In the context of artificial neural networks, what is the primary
difference between deep learning models and traditional models?
• A) Number of layers
• B) Learning rate
• C) Activation functions
• D) Batch size
Answer: A
3. What is the role of the learning process in neural network training?
• A) Initializing weights
• B) Reducing loss
• C) Selecting features
• D) Expanding the network
Answer: B
4. What does one epoch in neural network training refer to?
• A) One forward pass
• B) Processing the entire training set
• C) One backward pass
• D) Adjusting learning rates
Answer: B
5. Which function is commonly used as a step function in perceptrons?
• A) Sigmoid
• B) Hyperbolic Tangent
• C) ReLU
• D) Heaviside or signum
Answer: D
6. What is the purpose of a biased neuron in a perceptron?
• A) Enhance computational efficiency
• B) Introduce non-linearity
• C) Shift the transfer function curve
• D) Reduce the learning rate
Answer: C
7. What is the primary limitation of perceptrons?
• A) Lack of non-linearity
• B) Difficulty in parallel processing
• C) Inability to handle binary classification
• D) Limited to linearly separable problems
Answer: D
8. When does a multilayer neural network become a deep neural
network?
• A) Two hidden layers
• B) Three hidden layers
• C) At least four hidden layers
• D) At least two hidden layers
Answer: D
9. What is the primary purpose of activation functions used in
multilayer perceptrons?
• A) Control learning rate
• B) Introduce non-linearity
• C) Adjust batch size
• D) Reduce model complexity
Answer: B
10. In the context of binary classification, what does the sigmoid
function estimate?
• A) The gradient of the loss function
• B) The probability of belonging to the positive class
• C) The learning rate
• D) The feature weights
Answer: B

LECTURE 11

1. What is the primary purpose of using convolutional networks in


deep learning?
a. Enhancing audio signals
b. Solving regression problems
c. Handling image data effectively
d. Optimizing computational efficiency
Answer: c
2. Which layer is responsible for randomly dropping neurons to
counteract overfitting in DCNNs?
a. Convolutional layer
b. Pooling layer
c. Dropout layer
d. Flatten layer
3. Answer: c
4. What is the purpose of the pooling layer in DCNNs?
a. Increase spatial volume
b. Decrease computational cost
c. Flatten the input
d. Enhance feature extraction
Answer: b
5. Which layer transforms a three-dimensional tensor into a vector in
DCNNs?
a. Convolutional layer
b. Pooling layer
c. Dropout layer
d. Flatten layer
Answer: d
6. In DCNNs, what does a fully connected layer compute?
a. Class scores
b. Feature maps
c. Receptive fields
d. Convolution operations
Answer: a
7. What is the purpose of the convolution operation in DCNNs?
a. Increase filter size
b. Compute dot products
c. Perform max-pooling
d. Flatten the input
Answer: b
8. Which term refers to a local region of the input image that has the
same size as the filter in DCNNs?
a. Feature map
b. Receptive field
c. Stride
d. Padding
Answer: b
9. What is the role of filters in DCNNs?
a. Increase the number of classes
b. Reduce spatial volume
c. Emphasize vertical lines
d. Compute class scores
Answer: c
10. How can model performance be improved in DCNNs?
a. Decreasing filter size
b. Avoiding data augmentation
c. Changing the activation function
d. Using methods like data augmentation, changing optimizer, changing
learning rate, and changing architecture
Answer: d
11. Which layer type is used to transform a three-dimensional tensor
into a vector in DCNNs?
a. Convolutional layer
b. Pooling layer
c. Dropout layer
d. Flatten layer
Answer: d

You might also like