Professional Documents
Culture Documents
Welcome to our comprehensive guide for on-campus placement questions! In today’s competitive job
market, on-campus placements are the stepping stones to promising careers. To help you succeed in these
recruitment processes, we’ve compiled a guide that covers a wide range of placement topics and questions.
Each chapter focuses on a specific area, such as Statistics, Programming, Machine Learning, Deep
Learning. You’ll find carefully selected questions, including frequent ones encountered during interviews,
to hone your skills and boost your confidence.
Machine Learning
40%
25%
Probability & Statistics
20%
1
Chapter 1 Machine Learning
1.1 Linear Regression
• What is Linear Regression?
• What are the Assumptions of Linear Regression?(Most Frequent Question)
– Validating the assumptions, understanding the impact of violations on the model, and ad-
dressing these violations.
– How to deal with Non-Linearity
– Define multicollinearity and explain its consequences in linear regression(VIF Score).
– Define heteroscedasticity and discuss its implications in linear regression analysis.
– Relevance of q-q plot
• What Evaluation Metrics Would You Use for a Linear Regression Model?
– Discuss common evaluation metrics such as Mean Squared Error (MSE), R-squared, and Mean
Absolute Error (MAE).
– Difference between R-squared and Adjusted-R-squared
• Explain OLS
• What are the limitations in Linear Regression?
– Describe the logistic (sigmoid) function and its significance in logistic regression.
– Describe the process of selecting the root node and the criteria for node splitting (Gini impu-
rity, entropy).
2
• What is pruning in decision trees, and why is it important?
– Difference between post and pre pruning
• What is the difference between classification and regression trees (CART)?
– Describe the process of selecting the root node and the criteria for node splitting in CART.
• What is feature importance in decision trees, and how is it calculated?
• Do we need feature scaling in Decision Tree ?
• What are some common techniques to prevent overfitting in decision trees? - De-
scribe methods like pruning, setting a maximum depth, and minimum samples per leaf to prevent
overfitting.
• How does a decision tree handle missing values and categorical variables?
• How do you prevent overfitting in a Random Forest model? - Describe techniques like
tuning hyperparameters, limiting tree depth, and adjusting the number of features considered for
splitting.
• In what scenarios would you choose Random Forest over other machine learning al-
gorithms?
• What are some limitations or potential drawbacks of using Random Forest? - Mention
challenges or scenarios where Random Forest may not be the best choice.
3
1.6 KNN
• What is K-Nearest Neighbors (KNN)?
• What are the advantages and disadvantages of KNN compared to other machine
learning algorithms?
– What is Curse of Dimentionality?
• How does KNN handle imbalanced datasets, and what techniques can be applied to
address this issue?
• What is the difference between parametric and non-parametric models, and where
does KNN fall in this classification?
• How does KNN handle missing data, and what are common imputation techniques
used in this context?
1.7 SVM
• How does Support Vector Machine work for classification
4
1.8 Unsupervised Learning
• Explain Unsupervised Learning
• What is the purpose of Dimensionality Reduction in Unsupervised Learning?
• What is the Elbow Method in clustering? How is it used to determine the optimal
number of clusters?
• What metrics are commonly used to evaluate the performance of Unsupervised Learn-
ing?
• How do you determine the optimal number of components in PCA?
• K Means
– Explain K Means algorithm.
– Can K-Means handle non-numeric data? If yes, how?
– Explain the process of selecting the optimal number of clusters (k) in K-Means.
– What is the objective function of K-Means?
– How does K-Means handle outliers in the dataset?
– Can you explain the difference between random initialization and k-means++ initialization?
– How does K-Means++ initialization improve the convergence.
– What is the role of centroids in K-Means clustering? How are they updated during the iterative
process?
– How can you handle categorical features in the K-Means algorithm?
– What is the Silhouette Score? How is it calculated, and what does it indicate about the
quality of the clustering?
• Hierarchical
– Which one is better between Hierarchical and K-means? (why!)
– Describe the algorithm (agglomerative and Divisive approach)
– How does Dendograms work here?
1.9 Miscellaneous
• Missing Value:
– How do you identify and handle missing data in a dataset depending on the feature?
• Outlier:
– Why are outliers important, and how do you detect and manage them?
• Feature Engineering
– Transformation
– One-Hot-encode, Ordinal Encodings
– Curse of Dimentionality
– Missing Value Imputation
• Feature Selection:
– Explain the significance of feature selection.
– Explain PCA(What is eigen values and vectors) ,Backward/forward Elimination
• Overfitting and Underfitting:
– Define overfitting and underfitting and ways to address them.
5
– Explain Bias-Variance Trade off
– What do you mean by ’Out of time validation’
• Cross Validations:
• Regularizations:
– Why do we use regularization, and what are L1 and L2 regularization?
– Difference between L1 and L2. How does L1 helps in feature selection.
– Explain ElasticNet
• Imbalanced Dataset:
– Discuss strategies for addressing imbalanced datasets.
– Oversampling techniques
– SMOTE and its limitations
– Explain ElasticNet
• Metrics of Perforamce of Classification Models
– Confusion Matrix, Precision, Recall, F1-Score.
– Is accuracy is a good measure for classification always?
– Importance of precision and recall in different case studies(specially for imbalanced dataset)
• Explain AUC-ROC and limitations of it.
– Is 0.5 is a good threshold always? How to find best threshold?
– What do you mean by 90% AUC
6
Chapter 2 Probability & Statistics
• Five Number summary
• What do you mean by Skewness? How do you deal with highly skewed data?
• Difference between Variance and Standard Deviation
• Difference between Correlation and Covariance with their formula. How do you in-
terpret correlation coefficients?
• If we change the origin of two variable how does their correlation will change?
• Difference between Collinearity and Multicollinearity
• Quantile and Percentile
• Make yourself comfortable with quick probability questions (Ex. Bayes Theorem
Problems, Dice and Cards problems
• Random Variables
• Bias, Unbiased Estimator
– Binomial,Poisson,Geometric Distributions
– Normal,Chi-Squared,t,gamma,F Distributions,Anova
∗ Properties of Normal Distribution
∗ Sum/Product of independent Normal Distributions are Normal or Not.
∗ 68-95-99 rule of Normal Distribution
∗ Aware of the application of these distributions
• Hypothesis test
– Explain Type I and Type II errors, Power of the test in hypothesis testing.
– What is p-value in hypothesis testing, and how do you interpret it? (Frequent)
– What are Level of confidence and confidence intervals, and how are they related to hypothesis
testing?
• What is the Central Limit Theorem (CLT), and difference between SLL and CLT?
• What is sampling bias(with the 4 types of it), and how can you mitigate it when
collecting data?
7
Chapter 3 Deep Learning & NLP
3.1 Deep Learning
• What is Deep Learning, and how does it differ from traditional machine learning?-
What are the advantages and disadvantages of deep learning compared to traditional machine
learning algorithms?
• What is backpropagation, and why is it essential in training neural networks?
• Define Epoch, Dropout method. Explanation of GAN and CT-GAN. How does CT-
GAN overcome the shortcomings of GAN?
• What are some common challenges in training deep learning models, and how can
they be addressed? - Mention issues like vanishing gradients, exploding gradients, and noisy
data, along with strategies to tackle them.
• How do you evaluate the performance of deep learning models, and what metrics are
commonly used?
3.2 NLP
• What is tokenization in NLP, and why is it necessary?
• What is stemming and lemmatization, and why are they used in NLP, and what is
difference between them?
• Define stop words and discuss strategies for dealing with them in text processing.
– Discuss word embeddings, like Word2Vec and GloVe, and their ability to represent words as
vectors in high-dimensional spaces.
8
– CBOW ,Skipgram
• What is are the methods used in NLP for dimensionality reduction?
• What is sentiment analysis, and how can it be performed using NLP techniques?
Chapter 4 Programming
4.1 Python
• NumPy
• Pandas
• Matplotlib
• Seaborn
• Scikit-Learn
• Fundamentals of Python ( Ex: String, List, Tuple, Dictionary, Set, Boolean)
4.2 SQL
• SQL Queries
• Filtering Data
• Aggregation Functions
• JOIN Operations
• Subqueries
• Indexing
• Update create tables
• Group by
• Window Functions