Professional Documents
Culture Documents
Index
1
Index
1 Preface 3
3 Interview Experiences 7
4 Guesstimates 26
4.1 Fundamentals 27
2
Preface
Dear Reader,
The domain of analytics is quickly gaining popularity across corporate firms worldwide. From a
subject which was prominently deployed in research projects in the 19th and 20th century to
becoming a major driving factor for Donald Trump to win the most prestigious elections in the
world, analytics has surely come a long way. Its importance has scaled to such an extent that
firms which are not utilizing its capabilities in any of its operations are bound to lose out in the
long run. Hence corporates are either building their own indigenous analytics capabilities or
leveraging the services of prominent analytics firms for their flagship projects. Intuition is
slowly taking a backseat and every decision gets scrutinized through the lens of analytics.
As a result, managers who have the technical know-how of analytics combined with the grit to
lead capability development activities in this field are in great demand today. This demand
would grow exponentially as more and more firms push themselves to develop such
capabilities. Increase in the number of analytics profiles visiting top B-Schools is hence not a
mere coincidence. With this thought, the Analytics Society aims to keep students of IIMB ahead
of the curve by inculcating the motivation to learn analytics and provide the requisite tools at
all stages.
As an effort in this direction, we present to you a compendium that can help you prepare for
your placement interviews. It is a three-part booklet with the first part focusing on prominent
analytics definitions and terminologies which are ‘must to know’ for any analytics interview.
The second part discusses Summer interview experiences of IIM Bangalore 2018-2020 PGP
students. The final part focusses on techniques on solving guess estimates and sample
scenarios across the same interviews. We would like to thank all the PGP1 students who shared
their interview experiences with us.
We hope you find this useful. All the best for your interviews!
Thanks,
The Analytics Society of IIM Bangalore
Follow us on:
3
‘Must-Knows’ for Analytics Interviews
4
Important Term and Definitions
• Bias – Statistical bias is error you cannot correct by repeating the experiment many
times and averaging the results together
• Big Data - Extremely large data sets that may be analysed computationally to reveal
patterns, trends, and associations. These differ in terms velocity, variety and veracity.
• Correlation - Correlation analysis is a method of statistical evaluation used to study the
strength of linear relationship between two, numerically measured, continuous
variables.
• Covariance - Covariance is the expected value of variations of two random variates from
their expected values. It is a measure of how changes in one variable are associated
with changes in a second variable. A positive correlation means that higher values of
one variable are associated with higher values of the other variable.
• Clustering - An algorithm for dividing up data instances into groups identified by the
execution of the algorithm because of similarities that it found among the instances.
• Chi Square Test - Chi-square is a statistical method used to test whether the
classification of data can be ascribed to chance or to some underlying law. The chi-
square test “is an analysis technique used to estimate whether two variables in a cross
tabulation are co-related. “A chi-square distribution varies from normal distribution
based on the “degrees of freedom” used to calculate it. Chi Square distribution is the
distribution of the sum of squared standard normal deviates. The degrees of freedom
of the distribution is equal to the number of standard normal deviates being summed.
• Mean Absolute Error - The average error of all predicted values when compared with
observed values.
• Mean Squared Error - The average of the squares of all the errors found when
comparing predicted values with observed values. Squaring them makes the bigger
errors count for more, making Mean Squared Error more popular than Mean Absolute
Error when quantifying the success of a set of predictions.
• P value – A probability that provides a measure of the evidence against the null
hypothesis given by the sample. Smaller value indicate more evidence against H0.
5
• Poisson Distribution – The discrete random variable that is useful in estimating the
number of occurrences over a specified interval of time or space is described by the
Poisson probability distribution. For example, the number of repairs needed in 10 miles
of highway, number of leaks in 100 miles of pipeline.
• Principal Component Analysis - This algorithm simply looks at the direction with the
most variance and then determines that as the first principal component. Principal
component analysis is a dimension reduction tool that aims at reducing a large set of
variables to a small set that still contains most of the information.
• Linear Regression - A technique to look for a linear relationship (that is, one where the
relationship between two varying amounts, such as price and sales, can be expressed
with an equation that you can represent as a straight line on a graph) by starting with
a set of data points that don't necessarily line up nicely.
• Standard deviation - The square root of the variance, and a common way to indicate
just how different a measurement is from the mean. For a normal distribution
observations more than three standard deviations away from the mean can be
considered quite rare.
• T distribution – A family of probability distributions that can be used to develop an
interval estimate of a population mean whenever the population standard deviation is
unknown and is estimated by the sample standard deviation.
• Random forest - A random forest is an estimator that fits a number of classical decision
trees on various sub-samples of the dataset and use averaging to improve the
predictive accuracy and control over-fitting.
• T value – T values are used in wherein the sample size of the hypothesis test is less than
30. The procedure that calculates the test statistic compares your data to what is
expected under the null hypothesis.
• Logistics regression - Logistic regression is used to describe data and to explain the
relationship between one dependent variable and one or more independent variables.
• R2 – Coefficient of determination is the proportion in the variance of the dependent
variable that can be predicted from the independent variable. In regression, the R2
coefficient of determination is a statistical measure of how well the regression
predictions approximate the real data points. An R2 of 1 indicates that the regression
predictions perfectly fit the data.
• Time series data - A time series is a sequence of measurements of some quantity taken
at different times, often at equally spaced intervals.
• Variance - It is frequently used in statistics to measure how large the differences are in
a set of numbers. It is calculated by averaging the squared difference of every number
from the mean.
• Normal Distribution – A probability distribution which, when graphed, is a symmetrical
bell curve with the mean value at the centre. The standard deviation value affects the
height and width of the graph.
6
Interview Experiences
7
American Express Interviews
Information about the Company (May have come up during company research)
Role and profile offered: Analyst
Preference for work experience: - None
Industry: Finance
Position in the industry: -
Main competitors: - Visa, Mastercard
Major Products: Credit Card
Pre-Process Details
Resume (Y/N): Y
1 Topic –
It was case-study based GD. I believe the most important task was to find
We were given a case on a the factors that would affect the revenue and
resort-company which has try to have an order of importance of those
asked Amex to analyse factors. Most of the discussions was based on
reasons for its low revenue analysing the factors. So it may help to note
this season. down as many as relevant factors as possible
8
The discussion was mostly so that you can contribute some unique
based on the types data that points in the discussion.
Amex should ask from its
client and also the use of its
own data.
Number of people per group:
12
Number of people qualifying
to the interview: 2-3
Interviews
Round 1
Crucial Questions: 1. Since I had quite a few analytics points on my resume, most
of the discussion was based on those topics. I was primarily
asked about the Big Data Project that I have done. Apart from
asking me about what I have done, they also asked about the
theoretical aspects of the Big Data like what are the properties
of it.
What went right? Since most of the questions were CV based and I have prepared
my CV points well, so I believe that it has helped me a lot. Also I
revised all the DSI and DSII concepts and they directly asked
from there. ( I went through all the Analytics terminologies as
mentioned in the Analytics interview prep-book as sent by the
Analytics society which I believe was really helpful)
9
What could be I believe that I could have answered the HR questions slightly
avoided? better.
Please provide any other insights on the preparation process before interviews:
Showing interest in the company’s products and operations in the placement ppt. I believe
that it helps to make an impression.
10
Name: Shouvik Das
Information about the Company (May have come up during company research)
Role and profile offered: Credit Risk Analyst and EDA
Preference for work experience: Freshers Preferred
Industry: Finance
Position in the industry: -
Main competitors: Discover, Bank of America, VISA, MasterCard
Major Products: Credit Cards
Pre-Process Details
1 Topic –
How big Data can help in Speak relevant points
preventing credit fraud
Give disruptive ideas
Number of people per group:
Give as many worldwide examples as possible
10
Number of people qualifying
to the interview: 2-4
11
Interviews
Round 1
Crucial Questions: 1. If you’ve to determine number of F1 enthusiasts or fans
in Bangalore what data points would you be looking at?
What could be One question was regarding future career options. I said I want a
avoided? career in Finance. That could have been avoided as AMEX was
not offering a role in finance
Questions asked? -
Round 2
Crucial Questions: 1. Expectations from this job and what projects I’ve done
before that is similar to this role.
What could be -
avoided?
Questions asked? -
Please provide any other insights on the preparation process before interviews:
12
Name: Abhishek Kumar Sachan
Company & Division: American Express – Enterprise Digital and Analytics Team
Information about the Company (May have come up during company research)
Role and profile offered: Analytics
Preference for work experience: Yes, the Job description clearly stated that but they took
freshers too so don’t worry😊
Industry: Financial Services
Position in the industry: 86th among Fortune 500
Main competitors: Visa, MasterCard, DFS
Major Products: Credit Cards
Pre-Process Details
Resume (Y/N): Y
13
Number of people qualifying
to the interview:
Interviews
Round 1
Crucial Questions: 1. Resume based question- picked up a project and asked
me to go to as much detail as possible. Must prepare the
project where you’ve mentioned a high revenue i.e. the
most glittery analytical project in your resume
What went right? They picked up a project I was really confident about
What could be Don’t say the term you could get questioned upon which you’re
avoided? not sure of. E.g. I said Principal Component analysis and got
stuck on it.
Round 2
Crucial Questions: 1. HR round but don’t get fooled. It is also analytical.
Basically idea generation and creativity is what they are
looking for.
What went right? I blabbered every idea I could come up with including unrealistic
and vague ones.
14
What could be Just don’t think too much. Say whatever comes to your mind.
avoided? Just keep it simple and quick.
Questions asked? Artificial Intelligence in houses. But something that none has
thought of so far. Someone answered plasma walls.
Please provide any other insights on the preparation process before interviews:
Please be very clear on whatever you say and keep an example ready to explain the
concept. They just want to see if you know how to interpret the output. That’s it.
15
Name: Ayush Singh
Information about the Company (May have come up during company research)
Role and profile offered: Business Analyst
Preference for work experience: -
Industry: Credit Card and Financial Services
Position in the industry: 3rd
Main competitors: Visa & MasterCard
Major Products: Credit Cards, Platinum Cards etc.
Pre-Process Details
Resume (Y/N): Y
1 Topic –
How can American Express Do not try to discuss generic topics and try to
help a client whose business use data as far as possible. There would be no
is flailing using the credit card real data per se. However, 4 kinds of data that
transaction data. could be used would be explicitly mentioned,
like merchant-side data and client-side data.
Try and use these
16
Number of people per group:
10
Number of people qualifying
to the interview:
3-4
Interviews
Round 1
Crucial Questions: 1. Basic questions related to predictive analytics like what is
beta in SLR equation? Asked about what all software I
have used related to analytics like SPSS, GAMS and Excel.
What went right? For practical questions like that of IRCTC, I gave good
suggestions and throughout the first round I was confident in
answering various questions. For answering theory questions, I
used paper to write and give answers, I think that was a good
move.
Round 2
Crucial Questions: 1. Since I had taken up Machine Learning as one of the
electives in B.Tech so I was asked questions related to
ML: like feature extraction and algorithms like K-Nearest
Neighbours. Furthermore, practical applications of all
these algorithms was asked.
17
2. Questions related to dimensionality reduction were
asked: like how to reduce the number of features in a
model to a bare minimum to classify whether a customer
will default in the future or not.
What went right? Since this round was slightly technical in nature so my technical
expertise helped. Furthermore, my confidence while answering
HR questions was also good.
Questions asked? How Big data analytics and ML can transform the credit card
business in the future, and what new ventures can it create for
credit card companies.
Please provide any other insights on the preparation process before interviews:
Just revise the basics of descriptive, predictive and prescriptive analytics (DS1, DS2 & BAI).
During the interviews try to remain confident and avoid giving generic answers. Also try to
understand the practical applications of basic algorithms used in Analytics.
18
United Health Group (UHG) Interviews
Information about the Company (May have come up during company research)
Role and profile offered: Advanced Analytics – Data Science
Preference for work experience: - None
Industry: Health Care Industry
Position in the industry: -
Main competitors: Anthem, Capital Blue Cross, Highmark, Humana, Aetna
Major Products: Uniprise, Health Care Services, Specialized Care Services
Pre-Process Details
Resume (Y/N): Y
1 Topic – N/A
Number of people per group: It is not necessary to speak multiple times. It is
more important to say good points
10
Number of people qualifying
to the interview: 3-4
19
Interviews
Round 1
Crucial Questions: 1. Asked me about three subjects I like in IIM Bangalore.
Asked me about BCG Matrix, asked to explain the matrix
in the medical field with products.
What went right? I expressed my interest and enthusiasm to work with them well.
Being a chemical engineer I am somewhat related to this
domain, and I did the internship and took a course before of
analytics which helped me to crack the interview.
What could be Nothing as such. Be confident and show interest for their
avoided? company.
Round 2
Crucial Questions: 1. They asked me about my extracurricular activities. In
Badminton, they asked me who is my favourite player
and why so? Some current affairs of that field.
What went right? About my hobbies(especially sports) whatever they asked I was
able to answer that correctly and logically. I showed my interest
in sports well, convinced them why I want to join their company.
What could be Nothing as such. Be confident and show interest for their
avoided? company.
20
Name: Dr Nisha Sharma
Information about the Company (May have come up during company research)
Role and profile offered: Data Analyst
Preference for work experience: Yes
Industry: Healthcare
Position in the industry: fortune 500. Biggest healthcare company of U.S, among the
fortune 5
Main competitors: Express Scripts, Wellpoint, Aetna
Major Products: Healthcare insurance
Pre-Process Details
Resume (Y/N): Y
1 Topic –
Do women make better The group discussion was very general. They
managers were looking for structured thought process
especially about launch of products. They are
very keen on people who are held bent to work
in healthcare industry.
21
Number of people qualifying 3
to the interview:
Interviews
Round 1
Crucial Questions: 1. What made you choose M.B.A? If you are not selected
here, what will you do?
2. What are the things you look for when you want to
launch a new healthcare policy?
What went right? I was very clearly able to signal them that being a medical
professional, I wanted to work in healthcare industry only &
couldn’t identify myself with any other company.
22
Name: Nishant Jaiswal
Information about the Company (May have come up during company research)
Role and profile offered: Data Analyst
Preference for work experience: No
Industry: Healthcare
Position in the industry: fortune 500. Biggest healthcare company of U.S, among the
fortune 5
Main competitors: Express Scripts, Wellpoint, Aetna
Major Products: healthcare insurance
Pre-Process Details
Resume (Y/N): Y
1 Topic –
Do women make better The group discussion quite general. They were
managers looking for structured thought process
especially about launch of products. They are
very keen on people who are interested to
work in healthcare industry.
23
Number of people per group: 7-8
Number of people qualifying 3
to the interview:
Interviews
Round 1
Crucial Questions: Q - Tell me about yourself
A - told him my intro, intern and PORs
Q - what is CAPM?
A - told him that return of portfolio depends on risk free return +
beta * risk premium
Q - draw a graph of it
A - drawn
24
Round 2
Crucial Questions: HR interview
Q - why UHG?
A - told him about it’s core competencies, huge size, growth
opportunities and a great opportunity to explore healthcare
Q - why healthcare
A - told him the benefits, how it is growing, US health budget
expanding and how analytics can help it grow
25
Guesstimates
26
Fundamentals
Since most of the analytics interviews concentrate on guestimates, let’s revise these
concepts.
To determine which approach to use, see which of the two sides is constrained. Use that
side to estimate the number. For example, to calculate number of people travelling in Delhi
metro, supply side (capacity of metro) is constrained and therefore the guesstimate has to
be done from supply side. However, if we need to plan the capacity of a new metro,
demand needs to be gauged.
From the demand side, a product can be replaced every few years. This can change the
demand of the product. For example, car tires can be replaced. So, a replacement factor
can be used to determine the demand of tires. A product can be reusable, which decreases
the demand. For example, taxis can be used throughout the day, which needs to be
discounted before calculating the total number of cabs needed in a day.
• Segmentation
As the demand varies across various segments, segmentation needs to be done while
solving the guesstimate. Examples of segments are income bracket, age, rural/ urban,
gender.
To determine the number of cars, income bracket could be used as the higher income
people use cars more. To determine the number of cigarettes, gender segmentation could
be useful. On the same lines, to determine number of burgers made in India, rural/urban
segmentation could prove to be helpful.
The utilization of a product need not be 100% and it can vary across the time. For
example, the occupancy rate of a taxi in peak time would be 100%, whereas in the non-
peak times, it could be 80%. However, the occupancy rate depends on other factors
27
also like weekday & weekend.
b. Conversion rates
This is useful in running marketing campaigns, to determine the target number of
customers.
Some basic figures that would help if kept at your fingertips are:
Population of India 120 crores (can take as 100 crores for ease of calculation)
Rural: Urban Split (India) 70:30
Average family size 5 people
Male: Female ratio 50:50s
Upper: Middle: Lower Class 10: 40:50
split (India)
Question
How many people wear red in New York on a typical Monday?
Solution:
• Step 1 – Clarification:
If a person wearing red goes out more than once, do we count them again? – No!
Does “New York” here refer to New York City or the state of New York? – New York City
These following questions would help us determine the number of people wearing red in
NYC on a typical Monday:
How many people are there in NY? What are the chances that people would wear red?
This depends on two primary factors: How many components of clothing do people wear
and their colour preference.
28
• Step 3 – Solving each piece:
Work with the interviewer to estimate each of those elements and come up with the
answer.
• Step 4 – Consolidating:
Let’s analyse the number of people wearing red from each group.
• Staying at home: 1,000,000 * 2 * 1/10 = 200,000. 1,000,000 people have two pieces
of clothing. Chance of having red in each piece: 1/10 (7 colours + grey + black +
white)
• Going out once: 14,000,000 * 5 * 1/20 = 3,500,000. 14,000,000 people have five
pieces of clothing. Chance of having red in each piece: 1/20 (on Monday, most of
these people go to work. Thus black and white will be the main colours they wear)
• Going out twice: 5,000,000 * 10 * 7.5% = 3,750,000. 5,000,000 people have ten pieces
of clothing. Chance of having red in each piece: 7.5% (the first trip is probably to
work: 1/20; the second trip is the casual trip: 1/10)
So, in total, there are about 7.5 million people in NYC wearing red on a typical Monday.
29
• Type of ping-pong ball – measurements (don’t ask directly, assume something and
confirm with the interviewer)
• Type of Boeing 747 – measurements (don’t ask directly, assume something and
confirm with the interviewer)
Step 2 – Breaking down the problem:
Expert Notes
The numbers in the above question are not essential, but it becomes dangerous when
assumptions are way off the mark. Let's say in the above question, you take the side of ping
pong ball to be 3 m instead of 3 cm, it clearly shows the lack of knowledge of measurement.
Numbers like the ‘Number of seats’ in Boeing can be guessed with prior experience, and if you
don’t have the relevant experience, please do ask in the interview for help – it does not any
lack of capability, in fact, it shows you are willing to ask for help when stuck in work which is a
necessary quality one must have.
30
Question 2
You are in a meeting with a client who mentions that she is considering building a new plant.
The new plant will require 100 million tons per year of recycled aluminum as an input. Your
client turns to you and asks you if there is 100 million tons of recycled aluminum available in
the US on a yearly basis. You do not have that information of the top of your hearing. How can
you answer the question on the spot?
Solution:
Step 1 – Clarification:
31
More Sample Guestimates
This can be done from demand side. Let’s start by estimating car market in India.
After estimating number of cars to be about 4.2 crores in India, tire market needs to be
estimated. The demand of tires can come from both new cars and replacements.
Suppose, life time of a tire is 3 years and life time of a car is 10 years.
Hence, the total demand for tires in India is 3.5 crores per year.
Estimate the number of flights taking off from Bangalore in a day
There are 2 types of flights flying from Bangalore- Domestic and International.
Let’s assume we need to estimate only domestic flights from Bangalore. There is higher
traffic for Tier-1 cities and lower traffic for Tier-2 cities. Hence, the segmentation is done
accordingly. It is assumed that there are about 10 tier-2 cities to which Bangalore has direct
connectivity. Since, Delhi and Mumbai have higher capacity, tier-1 cities are again divided
into 2 categories.
32
The occupancy rates depend on the timing of the day. Suppose, there are 2 busy periods in
morning and evening each for 3 hours. Further, the first 4 hours (12 AM- 4 AM) airports have
lesser traffic and it is safe to assume as non-operational hours. Hence, the rest 14 hours is
non-peak hours.
For example, there are generally 4 flights every hour in peak time (for 4 major airline players)
for Delhi.
In total, there are 90 flights for Delhi, Mumbai. Further, 57 (~60) flights for the remaining 3
tier-1 cities. So, in a day there are about 150 flights flying from Bangalore to tier-1 cities. For
tier-2 cities, there are about 58 (~60) flights flying from Bangalore to tier-2 cities.
So, in a day there are about 210 domestic flights taking off from Bangalore airport.
33
Question 4. Ola cab services are starting in Vizag. Estimate the number of cabs required in
first week
Assumptions:
We need to first look at demand side for potential customers for the cab service
There are multiple factors to be considered while solving this:
1) Age group
2) Income group
3) Gender- Female population (1/4) usage rate shall be half compared to male (1/2)-
Hence, a conversion factor of ¾ is used
There are 1.72 lakh potential customers for cabs. Considering, the public transportation and
auto rickshaws, it can be assumed that about 50% of the middle class and 30% upper class
(who owns more cars) can be converted to use cabs.
Suppose 5% of them can be converted to customers of Ola in first week. This gives scope for
almost 10K Ola customers in first week. Considering the size of Vizag, it can be assumed that
each customer takes average half an hour of travel on cab per day. Adding a 50% factor for
waiting time, and assuming on an average each cab driver works 9 hours a day.
😊-------------------------------------------Best of Luck--------------------------------------------😊
34