You are on page 1of 35

PROJECT REPORT

STATISTICAL METHODS FOR


DECISION MAKING
Date - 19th Feb, 2023
TABLE OF CONTENTS
Problem 1.........................................................3
1. Technical information about the dataset. . .4
2. Preliminary Analysis..................................5
3. Univariate Analysis....................................7
4. Multivariate Analysis................................17
5. Analyse the Observations........................24
6. Analysis - Gender and Personal Loans...26
7. Analysis – Working Partner.....................29
8. Marketing Strategy by Gender and Marital
Status.............................................................30
Problem 2.......................................................32
Solution...........................................................33

iii
PROBLEM 1

Analysts are required to explore data and reflect on the insights. Clear writing skill is an
integral part of a good report. Note that the explanations must be such that readers with
minimum knowledge of analytics is able to grasp the insight.
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and
Hatchback models. In its recent board meeting, concerns were raised by the members
on the efficiency of the marketing campaign currently being used. The board decides to
rope in an analytics professional to improve the existing campaign.

You as an analyst have been tasked with performing a thorough analysis of the data
and coming up with insights to improve the marketing campaign.

4
1. TECHNICAL INFORMATION ABOUT THE
DATASET

What is the important technical information about the dataset that a database
administrator would be interested in? (Hint: Information about the size of the dataset
and the nature of the variables)
Solution-
Below are the important technical information about the dataset that a database administrator
would be interested in
1. There are 1581 rows in the dataset.
2. There are 14 columns in the dataset
3. The amount of memory required to store the dataset is 173.05 MBs
4. The different datatypes in the dataset are as follows
a. There are 5 columns in the with int64 datatype
b. There are 8 columns in the with object datatype
c. There are 1 columns in the with float64 datatype

5
2. PRELIMINARY ANALYSIS

Take a critical look at the data and do a preliminary analysis of the variables. Do a
quality check of the data so that the variables are consistent. Are there any
discrepancies present in the data? If yes, perform preliminary treatment of data.
Solution-

Variables Overview
Categorical Data
1. Gender has 1528 values available out of 1581 rows. It also has 4 unique values.
2. Profession has 2 unique values.
3. Marital Status has 2 unique values.
4. Education has 2 unique values
5. Personal_loan has 2 unique values.
6. House_loan has 2 unique values.
7. Partner_working has 2 unique values
8. Make has 3 unique values

Nominal Data
1. Age is between 22 to 54

Numerical Data
1. No_of_Dependents has values from 0 to 4.
2. Salary is between 30000 to 99300, with the mean and 50%(median) close.
3. Partner salary is between 19573 and 80500. 868 of 1581 rows has partner_working as Yes
and 713 rows with other values. The Partner_salary has 1475 rows with available values.
Hence more rows has values than expected in Partner_salary column.
4. Total_salary has 1581 rows. This should the sum of Salary and Partner_salary. The
accuracy of this information needs to be tested. If there inconsistencies, then the column
needs to be recalculated
5. Price is between 18000 to 70000

6
Discrepancies in data
1. Gender has 4 unique values and 53 null values
2. Male, Female, Female, Femle
3. 868 of 1581 rows has partner_working as Yes and 713 rows with other values. The
Partner_salary has 1475 rows with available values. Hence, more rows has values than
expected in Partner_salary column.
4. Based on the difference in the Total_salary and Total_salary_computed (Sum of Salary
and Partner_salary fields) columns, we can identify that some rows have NaN in
Partner_salary column, with Partner_working=’Yes’. Hence we can impute these NaN
with the difference between Total_salary and Total_salary_computed

Preliminary Treatment of Data


1. Replacing Femal and Femle with Female as the two values are visually close. Also, the
NaN values would be replaced with the Mode. The imputation can be better performed
using KNN.
here are better ways to impute the missing values, like using KNN algorithm, which can
be explored in the future.
2. Missing Partner_salary is imputed using the difference between the Total_salary and
Salary fields
3. Partner_salary is is set to 0 where Partner_working = ‘No’
4. Partner_working is set to ‘No’ where Partner_salary = 0

7
3. UNIVARIATE ANALYSIS

Explore all the features of the data separately by using appropriate visualizations and
draw insights that can be utilized by the business
Solution-
Each of the features are analyzed individually to gather useful and meaningful insights
for the business. This task would also prepare us for further higher dimension analysis.
1. Age

 T
h
e

data is positive skewed with median age = 29 and a multimodal at 25 and 28.
 No. of users drop substantially from age 31, hence there is a relatively untapped market
from the age 31 and above.
 One can also make an inference that users prefer to buy cars at a younger age, which
can be identified from the users between age 22 to 31 where majority of buyers are
concentrated. Also, the campaigns have successfully worked for this age group (<31).
2. Gender
8
Close to 80% of the users are Male. This might also indicate bias in the data.

3. Profession

Individual in Business are relatively less likely to buy an automobile than Salaried. Though this
difference is not substantial. Multi-variate analysis might help in gathering more details.
4. Marital Status

9
From the above graph, we can infer that married individuals comprise of 91% of buyers.

5. Education

Individual with a Post Graduation are more likely to own a car than ones with only a
Graduation
6. No_of_dependents

10
Individuals with 2-3 dependents comprise of more than 70% of the buyers.

7. Personal_loan

The number of people buying cars are almost equally likely to have a Personal Loan. Hence
this column should not impact the decision the decision on the amount spent.

8. House_loan
11
66.7% of individual buying an automobile do not have a home loan. Hence, the likelihood of an
individual buying an automobile and having a home loan is 33.3 %.

9. Partner Working

An individual having a working partner increases the odds of buys an automobile to 54%
compared to one without a working partner with 45 %.

12
10. Salary
The distribution of Salary is a multimodal distribution with 3 modes at 38000, 58000 and
72000. The Mean and Median too are close and the distribution is relatively less skewed
which can be confirmed from a skew of -0.011.
Cars between the price 50,000 to 64,000 have a relatively high count.

11. Partner_salary

13
12. Total_salary

14
Total_salary is a bi-model distribution with modes at 58000 and 60000. It is also right skewed
with a significant decline from 100000.

15
13. Price
Price is right-skewed indicating Buyers of cars which cost less than 33,000 are significantly
more than greater than 33,000.

16
14. Make

Individuals are more likely to buy a Sedan or Hatchback.

17
4. MULTIVARIATE ANALYSIS

Understanding the relationships among the variables in the dataset is crucial for every
analytical project. Perform analysis on the data fields to gain deeper insights. Comment
on your understanding of the data. 
Solution-
1. Correlation

1. Age are Price have a high linear correlation


2. Age and Salary have a moderate linear correlation.
3. The linear correlation between Salary and Price is low

Inference-.
An individual with higher age is more likely to buy an expensive automobile that an
individual with more salary.
Also, as Age and Salary share linear correlation, it is likely that salary of an individual
increases with Age, but this increase in Salary does not always result to buying an
expensive automobile.

2. Gender and Profession vs Price

 Post graduates are more likely to buy cars.


 For Females, Salaried Graduates are more likely to buy an automobile than Business
Graduates.
 The numbers for Male Graduate (Business vs. Salaried) are not significantly different.
For both Education types.

18
3. Males are more like to own a car (inferred in previous question), but the average price of a
car is higher for women than men.

19
4. Married individual is more likely to own an automobile than single individuals
irrespective of the Make and Gender.

5. Married Males prefer to buy Sedan (45% of Married Male) and Hatchback (44% of
Married Male), while Single Males prefer to buy Hatchback (73% of Single Male).

Married Females prefer SUV (54% of Married Females) and Sedan (41% of Married
Females), while Single Females prefer Sedan (63% of Single Females) and SUV (32%
of Single Females).

20
6. The Average Spend by Single Men for Hatchbacks is marginally higher than Married
Males and Married Females. This is significantly lower for Single Females.

The Average Spend by Married Men for Sedan is marginally higher than Married
Females and Single Females. This is significantly lower for Single Males.

The Average Spend by Single Men for SUV is marginally higher than the other
categories.

21
7. Individuals with ages 30 and below and Total_salary less than 120,000 prefer a Hatchback
or Sedan
Individuals with ages between 31 to 45 and Total_salary less 130,000 prefer a Sedan more
than SUV.
Individuals with age greater than 45 and Total_salary greater than 70,000 prefer a SUV.

22
8. Below is the analysis on the spends by individuals based on the number of dependents

No_of_Dependents = 0
a. Total Salary <60000 and age <50 would prefer to spend <30000
b. Total Salary >60000, age is not relevant in choosing an automobile
No_of_Dependents = 1
a. Total Salary<70000, age is not relevant in choosing an automobile
b. Total Salary>70000, individual above the age of 40 would buy an automobile
No_of_Dependents = 2
a. Individuals with Age <=30 with a Total_salary< 110000, most likely buy cars with
Price<35000
b. Individuals with Age > 30 and Total_salary>11000 most like a Sedan or SUV.
No_of_Dependants = 3
a. Individuals with Age <= 30 with a Total_salary< 110000, have bought an automobile
with Price < 35000
b. Individuals with age >30 have bought an automobile with Price >50000 irrespective
of their Total_salary
23
No_of_Dependents = 4
a. Individuals with Age <= 30 with a Total_salary< 110000, have bought an automobile
with between 25000 to 35000
b. Individuals with age >30 have bought an automobile with Price >60000 irrespective
of their Total_salary

9. Individuals with Home loan (or Both loans) with >=3 dependents have bought cars with
Price <35000, irrespective of the Total_salary.

24
5. ANALYSE THE OBSERVATIONS

Employees working on the existing marketing campaign have made the following
remarks. Based on the data and your analysis state whether you agree or disagree with
their observations. Justify your answer Based on the data available.
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”

Solution-

I disagree with the statement by Steve Rogers. From the above countplot and value counts, it
is inferred that Women prefer SUV over Men.

25
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan
Solution-

I agree to the observation. Based on the above graph which shows the count of automobiles
owned by Salaried vs. Business individuals, salaried individuals are more likely to buy a Sedan
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an
easier target for a SUV sale over a Sedan Sale.
Solution-
I disagree with this observation. Salaried men are not an easier target for a SUV sale over a
Sedan sale. This can however be true for Females, as the count of SUV bought by Females is
marginally more than the Sedans.

26
6. ANALYSIS - GENDER AND PERSONAL
LOANS

From the given data, comment on the amount spent on purchasing automobiles across
the following categories. Comment on how a Business can utilize the results from this
exercise. Give justification along with presenting metrics/charts used for arriving at the
conclusions.
Give justification along with presenting metrics/charts used for arriving at the
conclusions.
F1) Gender
F2) Personal_loan

Solution-
1. Gender
 Males are primary buyer of automobiles. Males are more likely to own an automobile
and have contributed to 70% of the total revenue (based on the price of the car).
 The Average price and Median price of an automobile is higher for Females than Males,
indicating that a female would spend more than a male on buying a car.

27
 Females are more likely to buy SUVs than males.

2. Personal Loan
 Individuals with a Personal loan are equally likely to own an automobile as an individual
without a Personal Loan.

28
 The Total spend and Mean spend for individuals with Personal Loan is marginally less
than individuals without a Personal Loan, but the difference is too small to be relevant.
Hence, it can be inferred that the Total and Mean spend is not impacted by Personal
Loan

29
7. ANALYSIS – WORKING PARTNER

From the current data set comment if having a working partner leads to the purchase of
a higher-priced car.
Solution-
 Based on the Average Price and Maximum Spend for the respective values in the
Partner_working field, we can infer that a working partner does not lead to a purchase
of a higher-priced car.
 The average price a car for an individual with a working partner is less than that without
a working partner. The Maximum amount spend for both these groups are also the
same.

30
8. MARKETING STRATEGY BY GENDER
AND MARITAL STATUS

The main objective of this analysis is to devise an improved marketing strategy to send
targeted information to different groups of potential buyers present in the data. For the
current analysis use the Gender and Marital_status - fields to arrive at groups with
similar purchase history.
Solution-
1. On analyzing the Total Spend and Count of Automobiles purchased, we would infer that
Married Males and Single Women prefer to buy a Sedan.

31
2. Males and Females with age > 45 are more likely to own a. SUV

32
PROBLEM 2

A bank can generate revenue in a variety of ways, such as charging interest, transaction
fees and financial advice. Interest charged on the capital that the bank lends out to
customers has historically been the most significant method of revenue generation. The
bank earns profits from the difference between the interest rates it pays on deposits
and other sources of funds, and the interest rates it charges on the loans it gives out.
GODIGT Bank is a mid-sized private bank that deals in all kinds of banking products,
such as savings accounts, current accounts, investment products, etc. among other
offerings. The bank also cross-sells asset products to its existing customers through
personal loans, auto loans, business loans, etc., and to do so they use various
communication methods including cold calling, e-mails, recommendations on the net
banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards based on risk
policy and customer category class but due to huge competition in the credit card
market, the bank is observing high attrition in credit card spending. The bank makes
money only if customers spend more on credit cards. Given the attrition, the Bank
wants to revisit its credit card policy and make sure that the card given to the customer
is the right credit card. The bank will make a profit only through the customers that
show higher intent towards a recommended credit card. (Higher intent means
consumers would want to use the card and hence not be attrite.)
Problem 2 Question: ( Analyze the dataset and list down the top 5 important variables,
along with the business justifications.

33
SOLUTION

Below are the Top 5 relevant features in the dataset based on my analysis

1. annual_income_at_source – Annual income share a strong correlation with cc_limit


and avg_spend_l3m. This indicates that the Annual income is an important indicator for
the spending capacity an individual.

2. avg_spends_l3m –The avg_spends_l3m is the driving parameter that shows the


spends of customer and help identify their spending behavior. Using this variable with
other categorical and numerical variable will help identify patterns and insights which
can help in the required analysis.

3. Occupation_at_source – This variable is important to identify the nature of occupation


which are the biggest spenders and income earners. This would also help identify the
kind of occupation which are the currently generating the major chunk of revenue and
which can be the focus of future campaigns to improve market share.

4. card_type – The type of card and the utilization of each type would be an indicator on
the usability of their features, and how relevant they are in the market. The features
offered by the cards need to provide value to its users to increase the spends. The data
does show a positive correlation between annual_income_at_source and
avg_spends_l3m, but the relation between the Mean and Total Spend indicate
variability in the usage. This helps in identifying area of improvement and success in the
usability and campaigns of these cards.

5. Transactor_revolver – Based on the preliminary analysis of the data, Number of


transactions and Total Amount spend by users who generally pays off their balances in
full every month is significantly high than the users who carries the balances to next
month. Hence, this field can help identify users who would be less likely to default.
These users can be the target audience for new promotions to increase the spend.

34
35

You might also like