You are on page 1of 9

DSBA

Problem 1
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback models. In its recent
board meeting, concerns were raised by the members on the efficiency of the marketing campaign currently being used.
The board decides to rope in an analytics professional to improve the existing campaign.
1. You as an analyst have been tasked with performing a thorough analysis of the data and coming up with insights to improve
the marketing campaign.

A. What is the important technical information about the dataset that a database administrator would be interested in? (Hint:
Information about the size of the dataset and the nature of the variables)
Solution : There are 1581 rows and 14 columns in the given dataset . While looking at the nature of variables , we can see that there
are 8 categorical variables and 6 continuous variables as mentioned below :
Categorical Variables :

(a)
(b)
(c)
(d)

(f)
(g)

(m)

B. Take a critical look at the data and do a preliminary analysis of the variables. Do a quality check of the data so that the variables are
consistent. Are there any discrepancies present in the data?
Solution : While looking at the data , we can observe that there are some discrepancies in it like missing values in Gender and Partner_salary
columns & some anomalies can also be seen in Gender column.
Treating Missing Values & Outliers :
So, there are 53 and 106 rows with null values in Gender & Partner_salary columns respectively.
Let’s start with Partner_salary column first which has the highest number of blank/missing values. Since this feature has no possible outliers as
seen in the below boxplot, we can use mean to impute null values.
However, Total_salary variable has some outliers as below but these outliers are just extremes and not bad data, so no need to treat these
outliers.

And for the Gender column, since this is a categorical (binary) variable where there are only 2 values ,so it’s best to use the mode value here
which is Male gender type .
Treating Anomalies :
Now let’s treat anomalies in the Gender column by replacing Femal & Femle values with the value Female.

Checking for Duplicates:


And finally lets do a quick check for finding out if there are any duplicates and as seen below there are no duplicate values in this data.

Hence, now we are good to proceed further with our analysis as we have treated all missing values and anomalies in the data.

C. Explore all the features of the data separately by using appropriate visualizations and draw insights that can be utilized by the business.

On analysing all continuous variables as per the below summary , we can infer that the age group ranges from a minimum of 22
years old to a maximum of 54 years.

75% of the car owners are from age group equal to 38 years or less where their total salary ranges equal to or less than 95900 $.

Mean salary of car owners is 60392.22 $ which is higher than the median value indicating that the distribution is right tailed.

Also as per the below graphs we can see that Age & Price variables are right-skewed whereas Salary & Total_Salary are almost evenly
distributed with Total_salary having some outliers in extreme values.
And after analysing the categorical variables, we can infer that maximum car owners are salaried personnel, i.e.. around 56% and the largest
selling car model is Sedan whereas SUV model has the least market base .

D. Understanding the relationships among the variables in the dataset is crucial for every analytical project. Perform analysis on the data fields to
gain deeper insights. Comment on your understanding of the data.
Solution : Lets interpret the variables on the below basis :
(a) Continuous vs Continuous Variable : From the below plot we see that there is no clear correlation in terms of salary with respect to price
and this is the scenario with all continuous variables.
(b) Categorical vs Categorical Variable : From the below plot, it can be inferred that owners with working partners own Sedan model and
owners with non-working partners prefer both Sedan & Hatchback model. And 54% owners have working partners.

( c ) Categorical vs Continuous Variables :


(a) There is no strong correlation between any of the variables as shown in the below heatmap except Age & Price.

As per the below plot, median price for owners having business is higher for SUV models compared to other car models. And there are outliners
in lower values in both professions for SUV models , which means both type of owners have low-priced SUVs.

E. Employees working on the existing marketing campaign have made the following remarks. Based on the data and your analysis
state whether you agree or disagree with their observations. Justify your answer Based on the data available.
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target for a SUV sale over a Sedan Sale.
Solution :E1 : According to the below graph, I completely disagree with Steve Roger’s observation because women prefer SUVs
more than men.

E2 : As per the below graph, I totally agree with Ned Stark’s observation as Sedan models are largely owned by Salaried personnel.

E3 : As per the below graph, I disagree with Sheldon’s observation because Salaried males own more Sedan models than SUVs, hence they
can be considered as an easier target for Sedans and not SUVs.

F. From the given data, comment on the amount spent on purchasing automobiles across the following categories. Comment on how
a Business can utilize the results from this exercise. Give justification along with presenting metrics/charts used for arriving at the
conclusions.
Give justification along with presenting metrics/charts used for arriving at the conclusions.
F1) Gender
F2) Personal_loan
Solution :F1 As per the below chart and plot , we can conclude that men have spent more amount compared to women however
women spent an average amount of 15289 $ (47705-32416) more than the men and also median price for female owners is quite
higher than the male. Hence, business can plan their marketing strategy for premium models or high-priced cars focusing more
towards Female customer category rather than men .
Min Max Avg
Gender Price Price Price Count
Male 18000 70000 32416 1252
Female 20000 69000 47705 329
F2 : As per the below chart and plot , we can conclude that having or not having a Personal loan has not highly affected the amount
spent on purchasing cars in this dataset as the price range in case of both the scenarios are same ,however we can see a slight high
average price for the owners who don’t have a personal loan .
So, it is quite irrelevant for the business to include personal loan aspect details while deciding their marketing strategy.
Personal Min Max Avg
Loan Price Price Price Count Range
Yes 18000 70000 34457 792 52000
No 18000 70000 36742 789 52000

G. From the current data set comment if having a working partner leads to the purchase of a higher-priced car.
Solution : As per the scatterplot, we can comment that the above statement is not absolutely correct because there is no clear
correlation between owners having a working partner and high- priced car and also as per the boxplot ,both type of owners have
more or less similar range, mean and median.
Hence, it would be incorrect to interpret that having a working partner leads to the purchase of a higher-priced car as no such
correlation is depicted between Partner_salary & Price variables and owners with non working partner also own high-priced car.
(Note : Taking Partner_salary column as salary is present only for Yes variable and 0 is indicated for all non -working partners.)

H. The main objective of this analysis is to devise an improved marketing strategy to send targeted information to different groups
of potential buyers present in the data. For the current analysis use the Gender and Marital_status - fields to arrive at groups with
similar purchase history.
Solution : Below table shows the various groups with similar purchase history .

Groups SUV Sedan Hatchback


Married Male 115 537 484
Married Female 166 127 14
Single Male 9 24 83
Single Female 7 14 1

And below graphs reflects the above groups , so we can infer that married male & female have more or less similar purchase history
and they tend to buy more cars rather than single male and female. And on an average, single male and female prefer Sedan and
SUV models.
Problem 2
A bank can generate revenue in a variety of ways, such as charging interest, transaction fees and financial advice. Interest charged
on the capital that the bank lends out to customers has historically been the most significant method of revenue generation. The
bank earns profits from the difference between the interest rates it pays on deposits and other sources of funds, and the interest
rates it charges on the loans it gives out.
GODIGT Bank is a mid-sized private bank that deals in all kinds of banking products, such as savings accounts, current accounts,
investment products, etc. among other offerings. The bank also cross-sells asset products to its existing customers through personal
loans, auto loans, business loans, etc., and to do so they use various communication methods including cold calling, e-mails,
recommendations on the net banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards based on risk policy and customer category class but due to
huge competition in the credit card market, the bank is observing high attrition in credit card spending. The bank makes money only
if customers spend more on credit cards. Given the attrition, the Bank wants to revisit its credit card policy and make sure that the
card given to the customer is the right credit card. The bank will make a profit only through the customers that show higher intent
towards a recommended credit card. (Higher intent means consumers would want to use the card and hence not be attrite.)
Analyze the dataset and list down the top 5 important variables, along with the business justifications.
Solution : Lets first do the preliminary analysis of the dataset as below :
Summary of Important Continuous Variables :

From the above, we can infer that maximum limit for any credit card is Rs.9,90,000.
Vintage with the bank ranges from 6 months to 60 months ,i.e. 5 years.
Mean average credit spends in the last 3 months is Rs.49,527 which is higher than the median value indicating that the distribution is
right-tailed.
Summary of Important Categorical Variables :
From the below table , we can infer that most of the credit card owners are salaried and most of the credit card holders have credit
cards from other banks as well .
And most of the credit card holders have high networth and they settle their balances in full every month.

Lets dig down further to analyze the business problem in detail :


As per the above heatmap , we can say that there is a strong correlation between few fields.
Annual_income_at_source is highly correlated to cc_limit .
And, Annual_income_at_source shows slightly high correlation with Avg_spends_l3m also.

On interpreting the below plot, we can infer that card holders who are self-employed, student and salaried have many outliers in
upper values which indicates that they have high average spending card holders in the last 3 months .

Also, on analysing the below plot we can infer that Visa card-holders have more dense outliers in the upper values than the other
two , however median and range of all the card issuers are almost same and there is not much difference.

As per below graph also , we can see that Visa card holders mostly pay off their balances in full every month and not postpone it to
the next month.
And if we analyse below plot, we can infer that edge, prosperity, chartered, prime, platinum,centurion and elite card holders have
more average spending in the last 3 months than the other card holders. And edge, prosperity and chartered card holders have more
densely populated outliers in the upper values.

Business Justification: As a result of the above analysis, we can conclude that the bank should focus more on improving or reviewing
their credit card policy towards Edge, Prosperity & Chartered card holders issued by Visa who are mostly Self-employed and salaried
as their average spending is high in the last three months and their spending is unaffected although they own other bank credit cards
. And in addition to this they settle their balances in full every month.
Hence, we can conclude that below are the top five important variables for framing a good analytical business problem for GODIGT
Bank to review its credit card policy. The columns look as follows:
1. card_type
2. annual_income_at_source
3. Occupation_at_source
4. avg_spends_l3m
5. Transactor_revolver
Note : If the cc_active30, cc_active60 & cc_active90 have 0 transactions , then the avg_spends_l3m should also be 0 but values are
still there in the data.

Thanks

You might also like