Professional Documents
Culture Documents
• A) Cooking recipes
• B) Weather forecasting
• C) Recommendation systems
• D) All of the above
• A) Classification
• B) Clustering
• C) Regression
• D) Dimension reduction
• A) Regression
• B) Clustering
• C) Q-learning
• D) Dimension reduction
ANSWERS
LECTURE 02 :
• A) To summarize data
• B) To classify data into categories
• C) To understand the central tendency of data
• D) To measure data dispersion
• A) Eliminating outliers
• B) Introducing outliers
• C) Creating dummy variables
• D) Handling missing data
• A) Continuous variables
• B) Ordinal variables
• C) Categorical variables
• D) Interval variables
• A) Identifying outliers
• B) Building machine learning models
• C) Understanding data dispersion
• D) Creating exploratory data analysis
10. In dealing with missing data, what technique involves replacing
missing values with a predictive model?
•A) Delete
• B) Random replace
• C) Replace with the summary
• D) Using predictive model
ANSWERS
LECTURE 3 :
• A) Categorizing data
• B) Predicting quantitative values
• C) Exploring data patterns
• D) Handling missing values
2. In univariate regression, what does "univariate" refer to?
• A) One variable
• B) Two variables
• C) Categorical variables
• D) Nonlinear variables
3. What is the purpose of a scatter plot in regression analysis?
• A) To show causation
• B) To visualize the relationship between variables
• C) To demonstrate statistical significance
• D) To display categorical data
4. Which metric is commonly used to evaluate how well a linear model
fits the data?
• A) Mean absolute error (MAE)
• B) R-squared (R2)
• C) Root mean squared error (RMSE)
• D) Pearson correlation coefficient
5. What does R-squared measure in the context of linear regression?
• A) How close predicted values are to actual values
• B) The proportion of variation in the dependent variable explained
by the independent variable
• C) The difference between predicted and actual values
• D) The absolute average of the errors
6. Why do we square the errors in the least squares method of linear
regression?
• A) To introduce non-linearity
• B) To eliminate negative values
• C) To make the errors absolute
• D) To penalize larger errors more than smaller errors
7. What does a true zero point in ratio scale mean?
• A) The absence of the attribute being measured
• B) The presence of a meaningful zero value
• C) The existence of negative values
• D) The inclusion of categorical data
8. In polynomial regression, what does a higher-degree polynomial
introduce?
• A) More features
• B) Nonlinearity
• C) Categorical variables
• D) Multicollinearity
9. What does the Root Mean Squared Error (RMSE) measure?
• A) How well the model fits the data
• B) The proportion of explained variance
• C) How close predicted values are to actual values
• D) The absolute average of the errors
10. What is the purpose of polynomial regression of higher degrees?
• A) To introduce multicollinearity
• B) To fit curves better
• C) To simplify the model
• D) To eliminate outliers
1. B) Predicting quantitative values
2. A) One variable
3. B) To visualize the relationship between variables
4. B) R-squared (R2)
5. B) The proportion of variation in the dependent variable explained by
the independent variable
6. D) To penalize larger errors more than smaller errors
7. B) The presence of a meaningful zero value
8. B) Nonlinearity
9. C) How close predicted values are to actual values
10. B) To fit curves better
LECTURE 4
A) When there are many variables that add minor value individually
LECTURE 5 :
Answer: C
2. In logistic regression, what function is introduced to handle
classification boundaries effectively?
• A) Exponential function
• B) Sigmoid function
• C) Hyperbolic tangent
• D) Logarithmic function
Answer: B
Answer: C
• A) Confusion matrix
• B) Regression plot
• C) ROC Curve
• D) AUC
Answer: C
Answer: A
Answer: A
Answer: C
Answer: C
Answer: B
LECTURE 6
2. What are the key parameters for a Support Vector Machine (SVM)?
• a) Kernels and C
• b) Learning rate and epochs
• c) Depth and leaves
• d) Features and labels
• a) It doesn't matter
• b) Avoid ties in majority voting
• c) Even K values perform better
• d) Odd K values result in overfitting
• a) To introduce randomness
• b) To make the time series stationary
• c) To reduce model complexity
• d) To increase autocorrelation
Answer: b) To make the time series stationary
• a) Cycle
• b) Seasonality
• c) Trend
• d) Autocorrelation
Answer: c) Trend
• a) Image recognition
• b) Sales forecasting
• c) Sentiment analysis
• d) Natural language processing
• a) Forecast errors
• b) Lagged values of the variable
• c) Random error
• d) Moving averages
• a) Order of differencing
• b) Number of unknown terms in autoregressive part
• c) Order of the moving average part
• d) Number of classes
• a) F1-score
• b) Precision
• c) Akaike Information Criterion (AIC)
• d) Mean Absolute Error (MAE)
10. What is the purpose of the Dickey Fuller test in building an ARIMA
model?
LECTURE 7
A. K-means
B. Hierarchical clustering
C. Principal Component Analysis (PCA)
A. Centroid
B. Silhouette method
C. Elbow method
D. Linkage method
A. Pearson correlation
B. Euclidean Distance
C. Manhattan Distance
D. Cosine Similarity
C. Standardize data
A. Euclidean Linkage
B. Ward Linkage
C. K-means Linkage
D. Silhouette Linkage
A. Step 1 – Expectation
B. Step 2 – Maximization
C. Centroid Calculation
D. Silhouette Calculation
D. Both B and C
LECTURE 8
1. What is the purpose of finding the optimal probability cut-off point
in binary classification?
• A) Minimize overall accuracy
• B) Maximize false positive rate
• C) Balance true positive rate and false positive rate
• D) Increase the total number of predictions
Answer: C
2. In an imbalanced dataset, what is the common problem when
providing equal samples of positive and negative instances to a
classification algorithm?
• A) Overfitting
• B) Underfitting
• C) Bias
• D) Optimal performance
Answer: C
3. Which resampling technique may lead to overfitting issues due to
multiple related instances?
• A) Random under-sampling
• B) Random over-sampling
• C) Synthetic Minority Over-Sampling Technique (SMOTE)
• D) No resampling technique leads to overfitting
Answer: B
4. What is the primary reason for bias in a model?
• A) Including the right features
• B) Including too many features
• C) Not including the right features
• D) Choosing high regularization parameters
Answer: C
5. What is the key reason for overfitting in a model?
• A) Using higher-order polynomial degrees
• B) Including too few features
• C) Reducing the model complexity
• D) Providing more data points
Answer: A
6. In K-fold cross-validation, how are the data points distributed
between training and testing sets in each iteration?
• A) Randomly assigned
• B) All data points are used for training
• C) All data points are used for testing
• D) K-1 folds for training, 1 fold for testing
Answer: D
7. What is the primary purpose of bagging in ensemble methods?
• A) Increase bias
• B) Reduce variance
• C) Enhance model complexity
• D) Eliminate underfitting
Answer: B
8. Which parameter defines the number of trees in a bagging
ensemble?
• A) max_features
• B) n_estimators
• C) n_jobs
• D) random_state
Answer: B
9. What is the key advantage of Extremely Randomized Trees
(ExtraTree) in comparison to regular decision trees?
• A) Lower variance
• B) Higher bias
• C) Higher interpretability
• D) Lower computational cost
Answer: A
10. Which parameter is responsible for controlling the subset of
features used for splitting nodes in bagging?
• A) n_estimators
• B) max_features
• C) n_jobs
• D) random_state
Answer: B
LECTURE 9
1. What are the three steps involved in the AdaBoosting process?
a. Data preprocessing, feature extraction, classification
b. Assigning weights, updating weights, majority voting
c. Gradient boosting, tree pruning, model fitting
d. Cross-validation, grid search, hyperparameter tuning
Answer: b
2. How is the weighted error rate of the weak classifier calculated in
AdaBoost?
a. Summing misclassification weights
b. Absolute difference of predicted and true labels
c. Mean squared error
d. Entropy calculation
Answer: a
3. What is the purpose of the majority voting step in AdaBoost?
a. Assigning weights to data points
b. Updating classifier weights
c. Selecting the best weak classifier
d. Determining the final predicted class
Answer: d
4. What is a key advantage of the XGBoost algorithm?
a. Limited support for parallel processing
b. Inability to handle missing values
c. Early stopping to prevent overfitting
d. Restricted maximum depth of trees
Answer: c
5. Which parameter controls the learning rate in the XGBoost algorithm?
a. max_depth
b. min_child_weight
c. subsample
d. eta
Answer: d
6. What are essential tuning parameters for boosting algorithms related to
tree structure?
a. Number of estimators, learning rate, max depth
b. Subsample, regularization term, colsample_bytree
c. Objective, eval_metric, learning rate
d. nthread, booster, max_depth
Answer: a
7. What is stacking in the context of ensemble methods?
a. Majority voting on predictions
b. Combining different models at the final level
c. Aggregating weak learners iteratively
d. Parallel processing of models
Answer: b
8. What is the difference between hard voting and soft voting in ensemble
voting?
a. Hard voting involves weighted probabilities, while soft voting doesn't.
b. Soft voting considers only class labels, while hard voting considers
probabilities.
c. Hard voting uses the argmax of predicted probabilities, while soft voting
involves majority voting.
d. There is no difference; the terms are used interchangeably.
Answer: c
9. What is a disadvantage of the GridSearchCV method for hyperparameter
tuning?
a. It requires less computational resources.
b. It always finds the global optimal parameters.
c. It is not computationally expensive.
d. It may not explore the entire hyperparameter space efficiently.
Answer: d
10. How does the RandomSearchCV method differ from the GridSearchCV
method?
a. RandomSearchCV is more computationally expensive.
b. RandomSearchCV requires fixed parameter values.
c. GridSearchCV uses a range for numerical parameters.
d. RandomSearchCV explores the entire hyperparameter space exhaustively.
Answer: c
LECTURE 10
LECTURE 11