Professional Documents
Culture Documents
Company: TCS
Role: Data Analyst
Company: JPMorgan
Role: Data Scientist(Risk Analytics)
Company: Philips
Role: Data Scientist
Company: Fractal
Role: Data Scientist
Company: ZS
Role: Data Science Consultant
1. What is Logistic Regression? What is the loss function in LR?
2. conditions for Overfitting and Underfitting?
3. Scaling? Normalisation and Standardization?
4. L1 and L2 Regularisation?
5. Is DBSCAN is better than K- Means Clustering?
6. RNN and CNN difference?
7. If model is suffering from low bias and high variance, which algorithm should you use to tackle
it? Why?
8. Can sometimes R2 metrics be negative. Why?
9. When is Ridge regression favourable over Lasso regression?
10. one hot encoding vs label encoding. Cases in which one is better than other?
Company : Deloitte
Role: Data Scientist
1. difference between Correlation and Regression.
2. Why do we square the residuals instead of using modulus?
3. Which evaluation metric should you prefer to use for a dataset having a lot of outliers in it?
4. Heteroscedasticity? How to detect it?
5. p-value?
6. deal with missing random values from a data set?
7. Z test, F test, and T-test?
8. Root Cause Analysis?
9. lists, sets and tuples ? difference?
10. lambda functions? Write small example python code using it.
11. Regularization?
12. DBSCAN Clustering?
Company: Deloitte
Role: Data Scientist
1. Kernels in SVM?
2. Overfitting and Underfitting? How do you handle them?
3. silhouette and elbow methods in clustering?
4. P value? Why threshold is 0.05 or less?
5. Regularisation and it's use?
6. Type 1 and type 2 error in confusion matrix?
7. How root node is selected in decision tree.
8. Imbalanced dataset? How do you handle it?
9. Evaluation Metrics for regression problems? With formula
10. Std deviation vs variance?
Company:Quantiphi Inc.
Role: Machine Learning Engineer
1. How Can You Choose a Classifier Based on a Training Set Data Size?
2. Semi-supervised Machine Learning?
3. Reinforcement Learning.? Make your system to Play a Game of Chess Using it?
4. Recommendation Engine? How does it work?
5. Why you do Pruning in Decision Trees?
6. Dimensionality reduction methods?
7. How descision tree make descisions?
8. K-means and KNN Algorithms? Difference?
9. Your Model evaluation strategy?
10. How do you Optimize ML models?
Company: Cognizant
Role: Data Analyst
1. ROC curve?
2. Trade-off between bias and variance?
3. Handle an imbalanced dataset?
4. Would you remove correlated variables first? Why?
5. Autoencoder?
6. Idea behind boosting?
7. Tackle Overfitting and Underfitting?
8. Density-based clustering algo and its working?
9. Sigmoid vs Softmax functions?
10. How Array different than List?
11. data types supported by JSON?
12. Experience with AWS SageMaker or Azure?
13. Test ML model for production scale?
14. Does XGBoost perform better than SVM?
15. how to design a social network and message board service like Reddit, Quora, etc.?
Company: JPMorgan
Role: Data Analyst
1. how do you develop data mining algorithms and databases from scratch?
2. What is a sensitivity analysis in the decision making process?
3. Most difficult database problem you faced?
4. How do you interpret the data using statistical techniques
5. Segmentation techniques?
6. KNN imputation method?
7. Map Reduce?
8. criteria for a good data model?
9. How can you highlight cells with negative values in Excel?
10. What is a Pivot Table?
11. difference between 1-Sample T-test, and 2-Sample T-test?
12. variance and covariance difference?
13. types of Joins in SQL? Difference?
14. get the third highest salary of an employee from Employee_Details in SQL
15. Mellon must be equally divided into 8 equal pieces. But can have only 3 cuts. How will you
make this possible?
Company: ZS
Role: Data Science Consultant
Company: Paytm
Role: Machine learning engineer
Company: Nokia
Role: Data Scientist
Company: Mu Sigma
Role: Data Scientist
Company: McAfee
Role: Data Scientist
1. Explain list comprehensions and how they're used in Python with an example.
2. Add a new column named Active to the employees DataFrame with all 1s to indicate each
employee's active employment status.
3. How can you convert timestamps to date time in MySQL?
4. What is a trigger in MySQL? How can we use it?
5. Explain dimensionality reduction, where it’s used, and its benefits?
6. How would you go about doing an exploratory data analysis (eda)?
7. What are the pros and cons of linear regression vs. tree-based models?
8. What is the difference between the ROC curve and the precision-recall curve?
9. What is the business impact (eg. accuracy, revenue etc) of one of the project mentioned in
your resume and how did other people or teams benefit from the project?
10. Have you tried a simpler model (eg. linear regression)? Why is it necessary to use a more
complicated model?
Company: Larsen and Toubro
Role: Data Scientist
Company: Verizon
Role: Data Scientist
Company: Facebook
Role: Data Scientist
1. What do you think the distribution of time spent per day on Facebook looks like? What metrics
would you use to describe that distribution?
2. You've been asked to generate a machine learning model that can map the nicknames of
people using Facebook. How do you go about designing this model?
3. How would you build the recommendation algorithm for type-ahead search for Netflix?
4. If 70% of Facebook users on iOS use Instagram, but only 35% of Facebook users on Android
use Instagram, how would you investigate the discrepancy?
5. Likes/user and minutes spent on a platform are increasing but total number of users are
decreasing. What could be the root cause of it?
Company: TheMathCompany
Role: Analyst (Data Science)
Company: MakeMyTrip
Role: Data Science Intern
Given a array, you need to print the square of values in sorted order.
2) you have an array in which all unique values are even times present except one. Find that
value.
P.S- if you want to be interview ready for Data Science roles then join "AI for ALL" course for end
to end Capstone Projects, guided Assignments and 1-1 Course Support Mentorship.
Topics covered are Python, SQL, ML, Statistics, Model Deployments, Interview Question Answers
and AWS Sagemaker from Beginner level to Interview level.
Company:Landmark group
Role: Associate Data Analyst
1. How would you check if the model is suffering from multi Collinearity?
2. Explain bias - variance trade off. How does this affect the model?
3. What does a statistical test do?
4. What type of performance metrics would you choose to evaluate the different classification
models and why?
5. Explain how you would find and tackle an outlier in the dataset.
6. What are the approaches for solving class imbalance problem?
7. Are ensemble models better than individual models? Why/why - not?
8. Differentiate between Tuple and list
9. When sampling what types of biases can be inflected? How to control the biases?
10. What is difference between K-NN and K-Means clustering?
Company: Deloitte
Role: Data Analyst
1. What is R2? What are some other metrics that could be better than R2 and why?
2. What is the curse of dimensionality?
3. What are advantages of plotting your data before performing analysis ?
4. How can you make sure that you don’t analyze something that ends up meaningless?
5. What is the role of trial and error in data analysis? What is the the role of making a hypothesis
before diving in?
6. How do you deal with some of your predictors being missing?
7. How would you quantify the influence of a Twitter user?
8. How would you come up with an algorithm to detect plagiarism in online content?
9. How would you explain a confidence interval to an engineer with no statistics background?
What does 95% confidence mean?
10. How would you explain to a group of senior executives why data is important?
Company: Bewakoof[dot]com
Role: Data Scientist
1. Have you ever used Lambda function and List comprehension in python, if yes, then give an
example?
2. What is the way that you would handle an imbalanced data set that’s being used for prediction
when mostly there are more negative classes than positive classes?
3. If a Person A and Person B are playing archery together and their abilities to fire the arrow at
the target are the same, and the probability of getting the target is 0.5 for both of them. Now
there is a situation that A has fired 201 arrows and B has fired 200 arrows, then, what is the
probability that A gets more targets than B?
4. Explain what do we mean by making decisions based on comparing p-value with significance
level?
5. Is it possible that the t-value same for 90% two tail and 95% one tail test. If yes, then why is it
so?
Company: Hotstar
Role : Data Analyst
1. What is the statistical power of sensitivity?
2. If you are given an unsorted data set, how will you read the last observation to a new dataset
in python?
3. What is the difference between covariance and correlation?
4. Explain how did you handle missing data in one of the projects mentioned in your resume.
5. What is negative indexing? Why is it needed? Can you give an example for.the same in python
6. How will you estimate the number of weddings that take place in a year in India?
7. What is the condition for using a t-test or a z-test?
8. What is the main difference between overfitting and underfitting?
9. What is the formula for R-squared and adjusted R-squared metrics and when do you decide to
use them?
10. What is the KNN imputation method?
Company: CRED
Role: Data Scientist
Company: Accenture
Role: Data Scientist
1. What happens when neural nets are too small? What happens when they are large enough?
2. Why do we need pooling layer in CNN? Common pooling methods?
3. Are ensemble models better than individual models? Why/why - not?
4. Use Case - Consider you are working for pen manufacturing company. How would you help
sales team with leads using Data analysis?
5. Assume you were given access to a website google analytics data.
6. In order to increase conversions, how do you perform A/B testing to identify best page design.
7. How is random forest different from Gradient boosting algorithm, given both are tree-based
algorithm?
8. Describe steps involved in creating a neural network?
9. In brief, how would you perform the task of sentiment analysis?
Company: Wipro
Role: Data Scientist
Company: Deloitte
Role: Data Scientist
1. Conditional Probability
2. Can Linear Regression be used for Classification? If Yes, why if No why?
3. Hypothesis Testing. Null and Alternate hypothesis
4. Derivation of Formula for Linear and logistic Regression
5. Why use Decision Trees?
6. PCA Advantages and Disadvantages?
7. What is Naive Bayes Theorem? Multinomial, Bernoulli, Gaussian Naive Bayes.
8. Central Limit Theorem?
9. Scenario based question on when to use which ML model?
10. Over Sampling and Under Sampling
11. Over Fitting and Under Fitting
12. Core Concepts behind Each ML model mentioned in my Resume.
13. Genie Index Vs Entropy
14. how to deal with imbalance data in classification modelling?
Company: Mindtree
Role: Data Scientist
Company: IBM
Role: Data Scientist
Company: Intuit
Role: Data Scientist
Company: Uber
Role: Data Scientist
1. Pick any product or app that you really like and describe how you would improve it.
2. How would you find an anomaly in a distribution ?
3. How would you go about investigating if a certain trend in a distribution is due to an anomaly?
4. How would you estimate the impact Uber has on traffic and driving conditions?
5. What metrics would you consider using to track if Uber’s paid advertising strategy to acquire
new customers actually works? How would you then approach figuring out an ideal customer
acquisition cost?
Company: TheMathCompany
Role: Analyst (Data Science)
Company: Verizon
Role: Data Scientist
Company: NovoLink
Role: Data Scientist
Company: Sony
Role: Data Scientist
1. What methods do you use to identify outliers within a data set? What call do you take when
outliers are identified?
2. Guess-estimate the number of people in India that would use a cricket kit of an elite brand.
3. You are given a data set on cancer detection. You’ve build a classification model and achieved
an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you
do about it?
5. Rexntly global average temperature rose and there was decrease in number of pirates around
the world. Does that mean that decrease in number of pirates caused the climate change?
6. Both being tree based algorithm, how is random forest different from Gradient boosting
algorithm (GBM)?
7. You’ve got a data set to work having p (no. of variable) > n (no. of observation). Why is OLS as
bad option to work with? Which techniques would be best to use? Why?
Company: Siemens
Role: Data Scientist
1. What graphs do you use for basic EDA?
2. How to join tables in python?
3. What is the benefit of shuffling a training dataset when using a batch gradient descent
algorithm for optimizing a neural network?
4. Why it is not advisable to use a softmax output activation function in a multi-label classification
problem for a one-hot encoded target?
5. How can you iterate over a list and also retrieve element indices at the same time?
6. For a given dataset, you decide to use SVM as the main classifier. You select RBF as your kernel.
What would be the optimum gamma value that would allow you to capture the features of the
dataset really well?
7. What does the cost parameter in SVM stand for?
8. What is Softmax Function? What is the formula of Softmax Normalization?
Company: CRED
Role: Data Scientist
Company: Walmart
Role: Principal Data Scientist
1. You are given a data set with missing values that spread along 1 standard deviation from the
median. What percentage of data would remain unaffected?
2. Explain the difference between an array and a linked list.
3. 4How do you ensure you are not overfitting a model?
4. How do you fix high variance in a model?
5. What are hyperparameters? How do they differ from model parameters?
6. What is the default method for splitting in decision trees? What other methods are available
for its splitting
7. You are told that your regression model is suffering from multicollinearity. How do verify this is
true and build a better model?
8. You build a random forest model with 10,000 trees. Training error as at 0.00, but validation
error is 34.23. Explain what went wrong.
9. What is the recall, specificity and precision of the confusion matrix?
Company : HP
Role: Data Scientist
1. If through training all the features in the dataset, an accuracy of 100% is obtained but with the
validation set, the accuracy score is 75%. What should be looked out for?
2. For a given dataset, you decide to use SVM as the main classifier. You select RBF as your kernel.
What would be the optimum gamma value that would allow you to capture the features of the
dataset really well??
3. How is skewness different from kurtosis??
4. How to calculate the accuracy of a binary classification algorithm using its confusion matrix?
5. How will you measure the Euclidean distance between the two arrays in numpy?
6. Given two lists [1,2,3,4,5] and [6,7,8], you have to merge the list into a single dimension. How
will you achieve this?
7. In a survey conducted, the average height was 164cm with a standard deviation of 15cm. If
Alex had a z-score of 1.30, what will be his height?
Company: McAfee
Role: Data Scientist
1. Explain list comprehensions and how they're used in Python with an example.
2. Add a new column named Active to the employees DataFrame with all 1s to indicate each
employee's active employment status.
3. How can you convert timestamps to date time in MySQL?
4. What is a trigger in MySQL? How can we use it?
5. Explain dimensionality reduction, where it’s used, and its benefits?
6. How would you go about doing an exploratory data analysis (eda)?
7. What are the pros and cons of linear regression vs. tree-based models?
8. What is the difference between the ROC curve and the precision-recall curve?
9. What is the business impact (eg. accuracy, revenue etc) of one of the project mentioned in
your resume and how did other people or teams benefit from the project?
10. Have you tried a simpler model (eg. linear regression)? Why is it necessary to use a more
complicated model?
Company: Wipro
Role: Data Scientist
1. Let’s say you have a time series dataset grouped monthly for the past five years. How would
you find out if the difference between this month and the previous month was significant or not?
2. How do you investigate that a certain trend in the distribution is due to anomaly?
3. Write a production code to find all combinations of numbers in a list that sum up to 8.
4. How will you ignore special characters in a string and reverse it in python.
5. Given a polynomial function with n terms and k degrees, how many partial derivatives can you
form?
6. What metrics would you use to measure a model’s effectiveness in classification and regression
problems?
7. Suppose your model suffers from low bias and high variance. Which algorithm can be used to
solve the problem?
Company : Wipro
Role: Lead Data Scientist
Company: Hotstar
Role : Data Analyst
Company: TCS
Role: Data Scientist
1. How would you do an A/B test on a new metric to see if it truly captures meaningful social
interactions?
2. What's the difference between a left join, a union, and a right join?
3. Given a list, search for consecutive numbers (n) whose sum is equal to a specific number (x).
4. How would you create a model to find bad sellers on marketplace?
5. How can you predict Samsung phone sales?
Company : Ericsson
Role : Data Scientist
1. What kind of metrics you’d want to consider when solving for questions around health, growth,
or the engagement of a product?
2. How would you set up an experiment to evaluate any new products or improvements?
3. List assumptions about data in the context of linear regression.
4. How would you explain the application of probability to your product manager?
5. What was the most difficult project of your career and how did you solve it?
Company: Philips
Role: Principal Data Scientist
1. A model your team has built performs 90% accuracy. What do you need to know in order to
interpret whether this is good or not?
2. What is the difference between a box plot and a histogram?
3. Tell me about (a job on your resume). Why did you choose to do it and what do you like most
about it?
4. What is negative indexing? Can you give an example for the same? Why is it needed? .
5. What is the difference between descriptive, predictive, and prescriptive analytics.
6. What do you mean by imputation? Explain different types of imputation techniques you have
used in your project?
7. What is the condition for using a t-test or a z-test?
8. Given an IP address as an input string, validate it and return True/False
Company: Bewakoof[dot]com
Role: Data Scientist
1. Have you ever used Lambda function and List comprehension in python, if yes, then give an
example?
2. What is the way that you would handle an imbalanced data set that’s being used for prediction
when mostly there are more negative classes than positive classes?
3. If a Person A and Person B are playing archery together and their abilities to fire the arrow at
the target are the same, and the probability of getting the target is 0.5 for both of them. Now
there is a situation that A has fired 201 arrows and B has fired 200 arrows, then, what is the
probability that A gets more targets than B?
4. Explain what do we mean by making decisions based on comparing p-value with significance
level?
5. Is it possible that the t-value same for 90% two tail and 95% one tail test. If yes, then why is it
so?
Company: Genpact
Role: Data Scientist
1. What makes you feel that you would be suitable for this role, since you come from a different
background?
2. What is an imbalanced data set??
3. What are the factors you will consider in order to predict the population of a city in the future?
4. Basic statistics questions?
5. What are the approaches for treating the missing values?
6. Evaluation metrics for Classification?
7. Bagging vs Boosting with examples
8. Handling of imbalanced datasets
9. What are your career aspirations?
10.What's the graph of y = |x|-2
11. esstimate on no. Of petrol cars in Delhi
12.Case study on opening a retail store
13.Order of execution of SQL
Company: Infosys
Role: Data Scientist
1. What is a Python Package, and Have you created your own Python Package?
2. Explain about Time series models you have used?
3. SQL Questions - Group by Top 2 Salaries for Employees - use Row num and
Partition
4. Pandas find Numeric and Categorical Columns. For Numeric columns in Data
frame, find the mean of the entire column and add that mean value to each row of
those numeric columns.
5. What is Gradient Descent? What is Learning Rate and Why we need to reduce or
increase? Why Global minimum is reached and Why it doesn’t improve when
increasing the LR after that point?
6. Two Logistic Regression Models - Which one will you choose - One is trained on
70% and other on 80% data. Accuracy is almost same.
8. What is Log-Loss and ROC-AUC?
9. Do you know to use Amazon SageMaker for MLOPS?
10. Explain your Projects end to end (15-20mins)
Company: Deloitte
Role: Data Scientist
1. Conditional Probability
2. Can Linear Regression be used for Classification? If Yes, why if No why?
3. Hypothesis Testing. Null and Alternate hypothesis
4. Derivation of Formula for Linear and logistic Regression
5. Why use Decision Trees?
6. PCA Advantages and Disadvantages?
7. What is Naive Bayes Theorem? Multinomial, Bernoulli, Gaussian Naive Bayes.
8. Central Limit Theorem?
9. Scenario based question on when to use which ML model?
10. Over Sampling and Under Sampling
11. Over Fitting and Under Fitting
12. Core Concepts behind Each ML model mentioned in my Resume.
13. Genie Index Vs Entropy
14. how to deal with imbalance data in classification modelling?
Company: ArrayKart
Role: Data Scientist
Round 1: Asking about why did you chose Data Science as Career? Questions related to your
background if you are from non tech background.
What is a confusion matrix ? When it is used? What is True positive, False positive, True negative,
False negative
Company: Myntra
Role: Data Analyst
Introduce yourself.
ML questions:
Company: FISERV
Role: Data Scientist
1. How would you check if the model is suffering from multi Collinearity?
2. What is transfer learning? Steps you would take to perform transfer learning.
3. Why is CNN architecture suitable for image classification? Not an RNN?
4. What are the approaches for solving class imbalance problem?
5. When sampling what types of biases can be inflected? How to control the biases?
6. Explain concepts of epoch, batch, iteration in machine learning.
7. What type of performance metrics would you choose to evaluate the different classification
models and why?
8. What are some of the types of activation functions and specifically when to use them?
9. What is the difference between Batch and Stochastic Gradient Descent?
10. What is difference between K-NN and K-Means clustering?
11. How to handle missing data? What imputation techniques can be used?
Company: Ericsson
Role: Data Scientist
Give a logistic regression model in production, how would you find out the coefficients of
different input features.
Given a sand timer of 4 and 7 mins how would you calculate 10 mins duration.
What's the angle between hour and minute hand in clock as 3:15
Company: Axtria
Role: Data Scientist
Company: Deloitte
Role: Data Scientist
Company: Cognizant
Role: Data Scientist
Company: Google
Role: Data Scientist
1. Why do you use feature selection?
2. What is the effect on the coefficients of logistic regression if two 3. predictors are highly
correlated?
4. What are the confidence intervals of the coefficients?
5. What’s the difference between Gaussian Mixture Model and K-Means?
6. How do you pick k for K-Means?
7. How do you know when Gaussian Mixture Model is applicable?
8. Assuming a clustering model’s labels are known, how do you evaluate the performance of the
model?
Company: TCS
Role: Data Scientist
Company: Verizon
Role: Data Scientist
1. How many cars are there in Chennai? How do u structurally approach coming up with that
number?
2. Multiple Linear Regression?
3. OLS vs MLE?
4. R2 vs Adjusted R2? During Model Development which one do we consider?
5. Lift chart, drift chart
6. Sigmoid Function in Logistic regression
7. ROC what is it? AUC and Differentiation?
8. Linear Regression from Multiple Linear Regression
9. P-Value what is it and its significance? What does P in P-Value stand for? What is Hypothesis
Testing? Null hypothesis vs Alternate Hypothesis?
10. Bias Variance Trade off?
11. Over fitting vs Underfitting in Machine learning?
12. Estimation of Multiple Linear Regression
13. Forecasting vs Prediction difference? Regression vs Time Series?
14. p,d,q values in ARIMA models
Company: Fractal
Role: Data Scientist
Company: Wipro
Role: Data Scientist
Company: Accenture
Role: Data Scientist
1. What is deep learning, and how does it contrast with other machine learning algorithms?
2.When should you use classification over regression?
3.Using Python how do you find Rank, linear and tensor equations for an given array of
elements? Explain your approach.
4.What exactly do you know about Bias-Variance decomposition? 5.What is the best
recommendation technique you have learnt and what type of recommendation technique helps
to predict ratings? 6.How can you assess a good logistic model?
7.How to you read the text from an image? Explain?
8.What are all the options to convert speech to text? Explain and name few available tools to
implement the same?
Company: Genpact
Role: Data Scientist
Company: Ford
Role: Data Scientist
1. How would you check if the model is suffering from multi Collinearity?
2. What is transfer learning? Steps you would take to perform transfer learning.
3. Why is CNN architecture suitable for image classification? Not an RNN?
4. What are the approaches for solving class imbalance problem?
5. When sampling what types of biases can be inflected? How to control the biases?
6. Explain concepts of epoch, batch, iteration in machine learning.
7. What type of performance metrics would you choose to evaluate the different classification
models and why?
8. What are some of the types of activation functions and specifically when to use them?
9. What are the conditions that should be satisfied for a time series to be stationary?
10. What is the difference between Batch and Stochastic Gradient Descent?
11. What is difference between K-NN and K-Means clustering?
Company: Quantiphi
Role: Machine Learning Engineer
1. What happens when neural nets are too small? What happens when they are large enough?
2. Why do we need pooling layer in CNN? Common pooling methods?
3. Are ensemble models better than individual models? Why/why - not?
4. Use Case - Consider you are working for pen manufacturing company. How would you help
sales team with leads using Data analysis?
5. Assume you were given access to a website google analytics data.
6. In order to increase conversions, how do you perform A/B testing to identify best page design.
7. How is random forest different from Gradient boosting algorithm, given both are tree-based
algorithm?
8. Describe steps involved in creating a neural network?
9. In brief, how would you perform the task of sentiment analysis?
Company: Cognizant
Role: Data Scientist
Company: Ericsson
Role: Data Scientist
Give a logistic regression model in production, how would you find out the coefficients of
different input features.
Given a sand timer of 4 and 7 mins how would you calculate 10 mins duration.
What's the angle between hour and minute hand in clock as 3:15
1. Use Case - Consider you are working for pen manufacturing company. How would you help
sales team with leads using Data analysis?
2. Interviewers ask about scenarios or use-case based questions to know interviewee thought
process and problem-solving skills.
3. Assume you were given access to a website google analytics data.
4. In order to increase conversions, how do you perform A/B testing to identify best page design.
5. How is random forest different from Gradient boosting algorithm, given both are tree-based
algorithm?
6. Describe steps involved in creating a neural network?
7. LSTM solves the vanishing gradient problem, that RNN primarily have. How?
8. In brief, how would you perform the task of sentiment analysis?
Data Science Interview Questions asked to Sajal Soni
Company: Axtria
------------
1.RNN, NN and CNN difference.
2. Supervised, unsupervised and reinforcement learning with there algo example.
3. Difference between ai, ml and dl
4. How u do dimentionality reduction.
5. What is Multicollinearity
6. Parameters of random forest
7 . Parameters of deep learning algos
8. Different feature selection methods
9. Confusion matrix
Company: Bridgei2i
Role: Senior Analytics Consultant
Company: Mindtree
Role: Data Scientist
1. How many cars are there in Chennai? How do u structurally approach coming up with that
number?
2. Multiple Linear Regression?
3. OLS vs MLE?
4. R2 vs Adjusted R2? During Model Development which one do we consider?
5. Lift chart, drift chart
6. Sigmoid Function in Logistic regression
7. ROC what is it? AUC and Differentiation?
8. Linear Regression from Multiple Linear Regression
9. P-Value what is it and its significance? What does P in P-Value stand for? What is Hypothesis
Testing? Null hypothesis vs Alternate Hypothesis?
10. Bias Variance Trade off?
11. Over fitting vs Underfitting in Machine learning?
12. Estimation of Multiple Linear Regression
13. Forecasting vs Prediction difference? Regression vs Time Series?
14. p,d,q values in ARIMA models
1. What will happen if d=0
2. What is the meaning of p,d,q values?
15. Is your data for Forecasting Uni or multi-dimensional?
16. How to find the nose to start with in a Decision tree.
17. TYPES of Decision trees - CART vs C4.5 vs ID3
18. Genie index vs entropy
19. Linear vs Logistic Regression
20. Decision Trees vs Random Forests
1. Decorators in Python
- Live Example
2. Generators in Python
- Live Example
3. SQL Questions
6. Two Logistic Regression Models - Which one will you choose - One is trained on 70% and
other on 80% data. Accuracy is almost same.
7. What is LogLoss ?
How would you measure the performance of models built on imbalanced dataset.
Whats the meaning of precision and recall.
Tell me about a task you did and are very proud of.
Do you have any questions for me
Post interview I was given a ML assignment to solve and submit within next 2 weeks.
Round 2:
Complete ML technical stack used in project?
Different activation function?
How do you handle imbalance data ?
Difference between sigmoid and softmax ?
Explain about optimisers ?
Precision-Recall Trade off ?
How do you handle False Positives ?
Explain LSTM architecture by taking example of 2 sentences and how it will be processed?
Decision Tree Parameters?
Bagging and boosting ?
Explain bagging internals
Write a program by taking an url and give a rough code approach how you will pass payload and
make a post request?
Different modules used in python ?
Another coding problem of checking balanced parentheses?
1. Classification Accuracy
2. log loss
3. Confusion Matrix
4. Precision
5. Recall or Sensitivity or True Positive Rate
6. Specificity or True Negative Rate
7. False Positive Rate
8. F1 Score( for imbalanced datesets)
9. ROC - AUC
10. Mean Absolute Error(MAE)
11. Mean Squared Error(MSE)
12. Root Mean Squared Error(RMSE)
13. Root Mean Squared Logarithmic Error(RMSLE)
14. R-squared
15. Adjusted R-squared
16. Gini Coefficient or Gini Index(G.I)
17. Entropy gain
Company: Cerence
Role: NLU Developer
Question1 :
Write a function that take two strings as inputs and return true if they are anagrams of each other
and false otherwise
e.g.
(hello, hlleo) --> true
(hello, helo) --> false
Question 2 :
Write a function that take an array of strings "A" and an integer "n",
that return the list of all strings of length "n" from the array "A" that can be constructed
as the concatenation of two strings from the same array "A"
e.g.
A = [dog, tail, sky, or, hotdog, tailor, hot] and n=6
output should be "hotdog" and "tailor"
Question 3 :
Given an array "arr" of numbers and a starting number "x",
Find "x" such that the running sums of "x" and the elements of the array "arr" are never lower
than 1.
e.g.
arr = [-2, 3, 1, -5].
The running sums will be x-2, x-2+3, x-2+3+1 and x-2+3+1-5.
So, the output should be 4.
Company: GEOTAB
Python :
1. Is python a language that follows pass by value, or pass by reference or pass by object
reference
2. What are lambda functions and how to use them
3. Difference between mutable and immutable objects with example.
4. What are Python decorators? Why do we use them
SQL :
1. What is the difference between Inner join and left inner join ?
2. What are window functions ?
3. What is the use of groupby ?
SQL Round
3 tables given as below:
TRIPS
trip_id
vehicle_id
start_time
stop_time
VEHICLE_MAKE
vehicle_id
make_id
MAKES
make_id
make_name
There is a table which contains vehicle trips. Trips are not necessarily in order.
There is a table which contains vehicle makes. Makes are not necessarily known.
PROBLEM: Write SQL code that provides the number of trips that started on September 1st, 2020
for each vehicle with a KNOWN make.
Order the results by the trip count.
op
vehicle_id | trip_count
4 | 2
1 | 1
2 | 1
1st round:-
Introduction
Current NLP architecture used in my project
How will you identify Data Drift? Once identified how would you automate the handling of Data
Drift
Data Pipeline used
Fasttext word embedding vs word2vec
When should we use Tf-IDF and when predictive based word embedding will be advantageous
over Tf-IDF
Metrics used to validate our model
In MongoDB write a query to find employee names from a collection
In Python write a program to separate 0s and 1s from an array- (0,1,0,1,1,0,1,0)
Company: Latentview.
Initial they had asked for the explaining the project which I had done. I explained the Customer
prediction case . Then I was asked with python questions by sharing my screen.
Company: Myntra
Role: Data Analyst
Introduce yourself.
ML questions:
A continuous variable is having missing values, so how will you decide that the missing values
should be imputed by mean or median?
What is PCA and what each component means? Also, what is the maximum value for number of
components?
What is test of independence? How do you calculate Chi-square value?
When precision is preferred over recall or vice-versa?
Advantages and disadvantages of Random forest over Decision Tree?
What is the c hyperparameter in SVM algorithm and how it affects bias variance tradeoff?
What are the assumptions of linear regression?
Difference between Stemming and Lemmatization?
Difference between Correlation and Regression?
What is p-value and confidence interval?
What is multicollinearity and how do you deal with multicollinearity? What is VIF?
What is the difference between apply, applymap and map function in python?
Process - First a written test was taken which contained detailed questions on statistics and some
case studies. Two rounds of technical interviews took place and one HR round before the final
result.
Round 1:
They asked me to write an R code using the dplyr function in order to perform some operations
on a data frame.
Tell me which one out of base and dplyr uses copy mechanism and which one uses a pointer.
Which is faster?
Round 2:
Suppose you have a fixed number of trucks that you intend to send on different routes for
delivery purposes. How would you optimize the number of trucks to be sent to which routes in
order to maximize profit? (question on dynamic programming reinforcement learning)
“A word is actually its neighbours in a corpus” - Explain this sentence in machine learning lingo.
If R squared is less than adj R squared will you accept the model?