You are on page 1of 9

ISI Kolkata Campus Placement Questions

Ayushi Das & Sourav Ghosh

Welcome to our comprehensive guide for on-campus placement questions! In today’s competitive job
market, on-campus placements are the stepping stones to promising careers. To help you succeed in these
recruitment processes, we’ve compiled a guide that covers a wide range of placement topics and questions.

Each chapter focuses on a specific area, such as Statistics, Programming, Machine Learning, Deep
Learning. You’ll find carefully selected questions, including frequent ones encountered during interviews,
to hone your skills and boost your confidence.

Whether you’re a Data science enthusiast or an engineering student or a Mathematician or a Statis-


tician, this guide is tailored to prepare you for placement success. Dive in, practice, and equip yourself
for a rewarding career.

Best of luck in your placement journey!

Machine Learning

40%

25%
Probability & Statistics
20%

15% Programming & Algorithms

Deep Learning & NLP

Weight-age of topic based On Interviews

1
Chapter 1 Machine Learning
1.1 Linear Regression
• What is Linear Regression?
• What are the Assumptions of Linear Regression?(Most Frequent Question)
– Validating the assumptions, understanding the impact of violations on the model, and ad-
dressing these violations.
– How to deal with Non-Linearity
– Define multicollinearity and explain its consequences in linear regression(VIF Score).
– Define heteroscedasticity and discuss its implications in linear regression analysis.
– Relevance of q-q plot

• What is the hypothesis testing for linear regression?


– Try to include the significance of p-value and t-value in linear regression.
• What is the Cost Function (Loss Function) in Linear Regression?
– Describe the Gradient Decent method for optimizing cost function.

• What Evaluation Metrics Would You Use for a Linear Regression Model?
– Discuss common evaluation metrics such as Mean Squared Error (MSE), R-squared, and Mean
Absolute Error (MAE).
– Difference between R-squared and Adjusted-R-squared

• Explain OLS
• What are the limitations in Linear Regression?

1.2 Logistic Regression


• What is logistic regression, and how does it differ from linear regression?
– Explain the fundamental concepts and differences between logistic and linear regression.
• Explain the logistic function (sigmoid function) and its role in logistic regression.

– Describe the logistic (sigmoid) function and its significance in logistic regression.

• Metrics of Perforamce of Classification Models


• What are the assumptions of logistic regression, and how can you check them?
• What is the likelihood function in logistic regression, and how is it used to estimate
parameters?
• What is the purpose of the odds ratio in logistic regression, and how is it calculated?

1.3 Decision Tree


• What is a decision tree, and how does it work?
• What are the disadvantages of using decision trees for modeling?
• How is the root node selected in a decision tree, and what criteria are used for splitting
nodes?

– Describe the process of selecting the root node and the criteria for node splitting (Gini impu-
rity, entropy).

2
• What is pruning in decision trees, and why is it important?
– Difference between post and pre pruning
• What is the difference between classification and regression trees (CART)?

– Describe the process of selecting the root node and the criteria for node splitting in CART.
• What is feature importance in decision trees, and how is it calculated?
• Do we need feature scaling in Decision Tree ?

• What are some common techniques to prevent overfitting in decision trees? - De-
scribe methods like pruning, setting a maximum depth, and minimum samples per leaf to prevent
overfitting.
• How does a decision tree handle missing values and categorical variables?

1.4 Random Forest


• What is Random Forest, and Why is it called ”Random Forest”?
– Why it is ensemble method?
– Discuss the randomness involved in creating multiple decision trees.
– What are the advantages of using Random Forest over a single decision tree?
• How is feature selection done in a Random Forest model?
• What is out-of-bag (OOB) error in Random Forest, and how is it useful?

• Bias comparison between random forest and decision tree


• If same sample of dataset is given to two Random Forest Models does they will built
same tree ? If No then why?
• Can Random Forest handle regression tasks, and if so, how does it differ from classi-
fication?

• How do you prevent overfitting in a Random Forest model? - Describe techniques like
tuning hyperparameters, limiting tree depth, and adjusting the number of features considered for
splitting.
• In what scenarios would you choose Random Forest over other machine learning al-
gorithms?
• What are some limitations or potential drawbacks of using Random Forest? - Mention
challenges or scenarios where Random Forest may not be the best choice.

1.5 Bagging and Boosting


• What is the ”bagging” technique in ensemble learning, and how does Random Forest
implement it?
• What is Boosting and Stacking?

– What is the difference between Bagging and Boosting?(Frequent)


• Explain the working Principle of Gradient Boosting
– How boosting method handle missing values?
– Explain Adaboost,XGboost(for classification and regression both)
– Explain properties of Lightgbm,Difference between XGboost and lightGBM (Optional)

3
1.6 KNN
• What is K-Nearest Neighbors (KNN)?

– How does KNN work for classification and regression tasks?


• What is the significance of the ”K” value in KNN, and how do you choose an appro-
priate K value?(Elbow-Method)
• What distance metrics are commonly used in KNN, and when would you choose one
over another?

• What are the advantages and disadvantages of KNN compared to other machine
learning algorithms?
– What is Curse of Dimentionality?
• How does KNN handle imbalanced datasets, and what techniques can be applied to
address this issue?
• What is the difference between parametric and non-parametric models, and where
does KNN fall in this classification?
• How does KNN handle missing data, and what are common imputation techniques
used in this context?

1.7 SVM
• How does Support Vector Machine work for classification

– How does it is different from regression problem?


• SVM Kernel
– What is the kernel trick in SVM? What are the commonly used kernel functions and their
applications?
– RBF kernel
– How do you choose an appropriate kernel function for your SVM model?
• Difference between hard margin and soft margin
• Hyperparameters of SVM

– What are the parameters alpha and gamma signify?


– Explain the C parameter in SVM. How does it affect the SVM model, and what values can it
take?
– How do you handle overfitting in SVM?

• What is the importance of support vectors in SVM?


• What is the impact of outliers on an SVM model? How can outliers be handled in
SVM?
• How can SVM be extended to handle multiple classes?

• Difference between linear and non linear SVM


• Advantage of using SVM over other ML models and disadvantages of SVM

4
1.8 Unsupervised Learning
• Explain Unsupervised Learning
• What is the purpose of Dimensionality Reduction in Unsupervised Learning?
• What is the Elbow Method in clustering? How is it used to determine the optimal
number of clusters?
• What metrics are commonly used to evaluate the performance of Unsupervised Learn-
ing?
• How do you determine the optimal number of components in PCA?
• K Means
– Explain K Means algorithm.
– Can K-Means handle non-numeric data? If yes, how?
– Explain the process of selecting the optimal number of clusters (k) in K-Means.
– What is the objective function of K-Means?
– How does K-Means handle outliers in the dataset?
– Can you explain the difference between random initialization and k-means++ initialization?
– How does K-Means++ initialization improve the convergence.
– What is the role of centroids in K-Means clustering? How are they updated during the iterative
process?
– How can you handle categorical features in the K-Means algorithm?
– What is the Silhouette Score? How is it calculated, and what does it indicate about the
quality of the clustering?
• Hierarchical
– Which one is better between Hierarchical and K-means? (why!)
– Describe the algorithm (agglomerative and Divisive approach)
– How does Dendograms work here?

1.9 Miscellaneous
• Missing Value:
– How do you identify and handle missing data in a dataset depending on the feature?
• Outlier:
– Why are outliers important, and how do you detect and manage them?
• Feature Engineering
– Transformation
– One-Hot-encode, Ordinal Encodings
– Curse of Dimentionality
– Missing Value Imputation
• Feature Selection:
– Explain the significance of feature selection.
– Explain PCA(What is eigen values and vectors) ,Backward/forward Elimination
• Overfitting and Underfitting:
– Define overfitting and underfitting and ways to address them.

5
– Explain Bias-Variance Trade off
– What do you mean by ’Out of time validation’
• Cross Validations:

– What is cross-validation, and how does it benefit model evaluation?


– Explain K-fold Cross validations and GridsearchCV
– What are the difference between parameter and hyperparameters, and what is hyperparameter
tuning.
– Remember some hyperparameters of each ML Models

• Regularizations:
– Why do we use regularization, and what are L1 and L2 regularization?
– Difference between L1 and L2. How does L1 helps in feature selection.
– Explain ElasticNet

• Imbalanced Dataset:
– Discuss strategies for addressing imbalanced datasets.
– Oversampling techniques
– SMOTE and its limitations
– Explain ElasticNet
• Metrics of Perforamce of Classification Models
– Confusion Matrix, Precision, Recall, F1-Score.
– Is accuracy is a good measure for classification always?
– Importance of precision and recall in different case studies(specially for imbalanced dataset)
• Explain AUC-ROC and limitations of it.
– Is 0.5 is a good threshold always? How to find best threshold?
– What do you mean by 90% AUC

• Time complexity of each Machine Learning Models

6
Chapter 2 Probability & Statistics
• Five Number summary

• What do you mean by Skewness? How do you deal with highly skewed data?
• Difference between Variance and Standard Deviation
• Difference between Correlation and Covariance with their formula. How do you in-
terpret correlation coefficients?

• If we change the origin of two variable how does their correlation will change?
• Difference between Collinearity and Multicollinearity
• Quantile and Percentile

• Make yourself comfortable with quick probability questions (Ex. Bayes Theorem
Problems, Dice and Cards problems
• Random Variables
• Bias, Unbiased Estimator

– Binomial,Poisson,Geometric Distributions
– Normal,Chi-Squared,t,gamma,F Distributions,Anova
∗ Properties of Normal Distribution
∗ Sum/Product of independent Normal Distributions are Normal or Not.
∗ 68-95-99 rule of Normal Distribution
∗ Aware of the application of these distributions
• Hypothesis test

– Explain Type I and Type II errors, Power of the test in hypothesis testing.
– What is p-value in hypothesis testing, and how do you interpret it? (Frequent)
– What are Level of confidence and confidence intervals, and how are they related to hypothesis
testing?

• Gauss markov theorem, Markov Chain, Chebyshev’s Inequality


• What is stochastic process?
• Weak+Strong Law of large Numbers

• What is the Central Limit Theorem (CLT), and difference between SLL and CLT?
• What is sampling bias(with the 4 types of it), and how can you mitigate it when
collecting data?

7
Chapter 3 Deep Learning & NLP
3.1 Deep Learning
• What is Deep Learning, and how does it differ from traditional machine learning?-
What are the advantages and disadvantages of deep learning compared to traditional machine
learning algorithms?
• What is backpropagation, and why is it essential in training neural networks?

• Activation functions - and their role in introducing non-linearity to neural networks.


– What are activation functions and why are they necessary?
– What will happen if we use a linear activation function?
– Explain activation functions like Leaky Relu ReLU, sigmoid, and tanh.
– Why does Relu more preferable than tanh and sigmoid?
• What are the vanishing gradient and explodient gradient problems in deep learning,
and how can it be mitigated?
• What are optimizers and cost functions in the context of deep learning, and why are
they crucial? - Discuss popular optimizers like SGD, Adam, and RMSprop and their character-
istics.
• Differences between Gradient Decent and Stocastic Gradient Decent?
• What is a convolutional neural network (CNN), and where is it commonly used?
• What is a recurrent neural network (RNN), and how does it handle sequential data?

• Why does LSTM is more preferable over RNN?


• Explain overfitting in deep learning and methods to prevent it.
• What is the role of batch normalization in deep neural networks?

• Define Epoch, Dropout method. Explanation of GAN and CT-GAN. How does CT-
GAN overcome the shortcomings of GAN?
• What are some common challenges in training deep learning models, and how can
they be addressed? - Mention issues like vanishing gradients, exploding gradients, and noisy
data, along with strategies to tackle them.

• How do you evaluate the performance of deep learning models, and what metrics are
commonly used?

3.2 NLP
• What is tokenization in NLP, and why is it necessary?
• What is stemming and lemmatization, and why are they used in NLP, and what is
difference between them?
• Define stop words and discuss strategies for dealing with them in text processing.

• What are the Bag-of-Words,TF-IDF (Term Frequency-Inverse Document Frequency)


technique, and how does it work?
• What is named entity recognition (NER), and why is it useful in NLP?
• Explain the concept of word embeddings and their importance in NLP.

– Discuss word embeddings, like Word2Vec and GloVe, and their ability to represent words as
vectors in high-dimensional spaces.

8
– CBOW ,Skipgram
• What is are the methods used in NLP for dimensionality reduction?
• What is sentiment analysis, and how can it be performed using NLP techniques?

• How does Naive Bayes help in Sentiment Analysis?


• How do you evaluate the performance of NLP models, and what metrics are commonly
used?
• How can you handle imbalanced datasets in sentiment analysis or classification tasks?
-Email Spam classification

Chapter 4 Programming
4.1 Python
• NumPy
• Pandas

• Matplotlib
• Seaborn
• Scikit-Learn
• Fundamentals of Python ( Ex: String, List, Tuple, Dictionary, Set, Boolean)

4.2 SQL
• SQL Queries

• Filtering Data
• Aggregation Functions
• JOIN Operations
• Subqueries

• Indexing
• Update create tables
• Group by
• Window Functions

Practice leetcode,Hackerranks Problems

You might also like