You are on page 1of 48

GRADED

PROJECT- SMDM
DSBA-SMDM Module

Faizan Ali Sayyed

Faizan Ali Sayyed

0
1

Contents
Faizan Ali Sayyed ................................................................................................................................ 0
Problem 1 ........................................................................................................................................ 2
Problem 2 ...................................................................................................................................... 28

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


2

Problem 1
Analysts are required to explore data and reflect on the insights.
Clear writing skill is an integral part of a good report. Note that the
explanations must be such that readers with minimum knowledge of
analytics is able to grasp the insight.

Austo Motor Company is a leading car manufacturer specializing


in SUV, Sedan, and Hatchback models. In its recent board meeting,
concerns were raised by the members on the efficiency of the
marketing campaign currently being used. The board decides to rope
in an analytics professional to improve the existing campaign.

Question: You as an analyst have been tasked with performing a


thorough analysis of the data and coming up with insights to improve
the marketing campaign.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


3

Company Profile: Austo Motor Company

Specialization = SUV, Sedan and Hatchback Models

Vision = Efficiency of the Marketing campaign used by Austo Motors is raising


concerns in the Board. Need to identify the gaps and improve the campaign, with
the help of analytical insights, backed by data.

Dataset = Customer data with the model, financial ability of the household,
dependencies, demographics, and price of the model.

Objective: Use Analytics to assess the current Marketing performance, and


suggest measures to improve the efficacy of the same.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


4

A. What is the important technical information about the dataset that


a database administrator would be interested in? (Hint:
Information about the size of the dataset and the nature of the
variables)
Solution:

The dataset has 1581 records


There are 8 categorical variables, 6 numeric variables. (5 Integer,1 float
type, and 8 object type)
Total memory used by the dataset is 173 KB

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


5

There are 53 rows with missing values in column Gender, and 106 missing
values for Partner salary (Note that Partner salary can be 0 if a partner is not
working, hence missing values are a concern)

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


6

B. Take a critical look at the data and do a preliminary analysis of the


variables. Do a quality check of the data so that the variables are
consistent? Are there any discrepancies present in the data? If
yes, perform preliminary treatment of data.

Solution:
We already know, there are missing values in two variables Gender
and Partner Salary

Doing a preliminary statistical analysis of the numerical variables in the


dataset:

There are no duplicate rows in the dataset.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


7

In the given dataset, Partner salary is a quantitative variable with


incomplete values. The age varies between 22 to 54, there are 0- 4
dependents in a customer data pool.
The salary of customers varies from 30000 units to 99300 units,
while the average is higher than median, which means the salaries are
positively skewed.
The total salary of a household varies from 30000 units to 171000
units, and total salary is slightly higher than the median total salary.

 Based on the unique columns value count, there are typing errors in Gender
column

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


8

 Since there are two spelling errors ‘Femal & Femle’ , we will club then under
the “Female” section.
Checking the final output

Before

After

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


9

 Treating missing values in gender

Since the missing values is 53 out of 1581 (3.35%), it is better to impute


the values with the mode for Gender.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


10

 Treating missing values in partner salary:


Replacing all the empty values with
Partner salary = Total salary - Salary

 Identifying an treating outliers in the numerical data with help of IQR rule.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


11

From the boxplot charts, we can see that there are outliers in the no
of dependents, and Total Salary data.
It is likely that there are 0 dependents in a family, therefore it’s okay to not
treat it, and accept the outliers as valid in the data.

Now we will treat the invalid outliers in the Total Salary column, as
they can skew the data.Mean of the Total_salary is79625.99 i.e. 79626 units
and the Median is 78000 units

IQR = Q3-Q1
= 75% -25%
= 95900 – 60500
= 35400 units
Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed
12

Similarly, we calculate Upper and Lower limit, and then treat the outliers on
the higher side by dropping those rows

We have identified and removed inconsistencies and discrepancies from


the Dataset.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


13

C. Explore all the features of the data separately by using


appropriate visualizations and draw insights that can be utilized
by the business.
Solution:
Tabular visualization of Age, Dependents, Salary (Self Partner and
Total), Price

istriDistribution of all the key variables on Histogram:

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


14

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


15

istriDistribution of all the key categories based on the make:

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


16

% Distribution of Models sold

Median age to buy each Make

Average price of the Model

% Revenue generated by each Model

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


17

Key Takeaways:
 Sedan aquires 45% of total models sold, and generates maximum
44% revenue.
 Although SUVs are sold just 17%, less than half of Hatchback, they
generate almost similar revenue as the later.
 Average age for buying cars is 29.
 Average price to buy car is 35597.72 units
 Customer and Total Salaries are normally distributed, which
means Users from all Financial background in the dataset
 Based on Age distribution, there are more Young customers than
old
 More customer tends to buy less expensive cars, with a small peak
at later stage indicated another set of audience buying expensive
cars.
 Female customers tend to buy SUVs more, while Male customers
are very likely to buy Sedan and Hatchback
 Most of the customers are married than single. Single customers
tend to buy Hatchback, but overall Married customers tend to buy
SUVs, followed by Sedan
 Sedan is best seller among both segments of Salaried and
Business class. Hatchback is a close second for both, followed by
SUVs.
 Post Graduate students prefer Sedan and Hatchback over SUV
cars, and Graduates follow a similar trend
 SUV is the premium car, while Sedan is economic and Hatchback
is budget.
 Customers with and without Personal loan buys more Sedan and
Hatchback as compared to SUV. Customers with and without
House loan buys more Sedan and Hatchback with lesser
preference given to SUV. This creates a hypothesis on whether
loan of any kind doesn’t affect the choice of car a customer wants
to buy.
 Based on number of units sold, Sedan is more manufactured,

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


18

D. Understanding the relationships among the variables in the


dataset is crucial for every analytical project. Perform analysis on
the data fields to gain deeper insights. Comment on your
understanding of the data.
Solution:

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


19

Based on the pair plot, and correlation heatmap above, we can state that:

 There is a partially positive correlation between Price and Salary


 Most of the strong correlations from the scatter plot and heatmap
are clear indicators that they have positive correlation with the
Price of cars
 Age and Price have a direct strong positive correlation; hence
people tend to buy expensive cars with Age
 Very strong correlation between Partner salary and Total salary,
which doesn’t signify much, just that it increases salaries in
family.
 Strong correlation between
o Age and Price
o Partner salary and Total salary
 Medium correlation between
o Salary and Price
o Total salary and Price

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


20

E. Employees working on the existing marketing campaign have


made the following remarks. Based on the data and your analysis
state whether you agree or disagree with their observations.
Justify your answer Based on the data available.
E1) Steve Roger says “Men prefer SUV by a large margin,
compared to the women”
E2) Ned Stark believes that a salaried person is more likely to buy
a Sedan.
E3) Sheldon Cooper does not believe any of them; he claims that
a salaried male is an easier target for a SUV sale over a Sedan
Sale.
Solution:
E1) Based on the data

Males don’t prefer SUV over females.

So Steve Roger’s remark is incorrect

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


21

E2) Based on the data

Salaried professionals do prefer Sedan over other makes

So Ned Stark is correct with his statement

E3) Based on the data:

A salaried Male is more likely to buy a Sedan than an SUV, hence the
remark on salaried male being an easier target for SUV is not true.
Sheldon Cooper is surprisingly incorrect this time.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


22

F. From the given data, comment on the amount spent on


purchasing automobiles across the following categories.
Comment on how a Business can utilize the results from this
exercise. Give justification along with presenting metrics/charts
used for arriving at the conclusions.
Give justification along with presenting metrics/charts used for
arriving at the conclusions.
F1) Gender
F2) Personal_loan

Solution:
F1) For Gender amount spent on purchasing automobiles by Male and
Female is as follows:

Females have bought far less cars but their net spend is much higher
than male customers

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


23

The above chart confirms the same hypothesis as women tend to buy
more expensive cars than men

F2) For personal loan taken,

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


24

Customers with and without the personal loan have spent similar
aggregated amounts on car; Not taking loan has slightly higher spend
compared to with loan, but more customers have taken personal loan.

This implies spend by people who don’t take loan is relatively better.

In order to boost sales, Austo Motors can target female customers, and
offer better loan offers, lesser interest rates.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


25

G. From the current data set comment if having a working partner


leads to the purchase of a higher-priced car.
Solution:
Hypothesis: Having a working partner leads to purchase of higher priced
car.

There is no major difference in the mean as well as median value of Customer’s


pricing with and without their Partner working.

The trends in purchase patterns are also similar for people with and without
working partners.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


26

H. The main objective of this analysis is to devise an improved


marketing strategy to send targeted information to different
groups of potential buyers present in the data. For the current
analysis use the Gender and Marital status - fields to arrive at
groups with similar purchase history.
Solution:
If we make a cross tab to see the distribution of Customers with different
genders and marital status, we can obtain a buying patterns

 A campaign can be designed targeting the male customers with the


Hatchback Model or the Sedan as the targeted buy
 Similarly, Females should be targeted to buy SUVs and Sedans
 Married customers should be targeted with offers towards buying
Sedan and Hatchback.
 Single do not generate big business as they are less in number, and
they buy cheaper make, hence can be monitored, they can be
excluded from the pool of target segments.

Married vs Single Age Distribution for different Models

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


27

Male vs Female Age Distribution for different Models

 Married customers buy much more cars than single customers, also
they tend to opt for Sedan and SUV.
 Older females buy SUVs- a campaign could be targeted at them.
 Hatchback can be seen as a young male’s adrenaline rush- go to
model. As it is bought mostly by Male under 30
 Sedan is everyone’s preferred vehicle and a cash cow, it can be
utilized and leveraged to generate more revenue and gain more
traction.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


28

Problem 2
A bank can generate revenue in a variety of ways, such as
charging interest, transaction fees and financial advice. Interest
charged on the capital that the bank lends out to customers has
historically been the most significant method of revenue generation.
The bank earns profits from the difference between the interest rates it
pays on deposits and other sources of funds, and the interest rates it
charges on the loans it gives out.

GODIGT Bank is a mid-sized private bank that deals in all kinds


of banking products, such as savings accounts, current accounts,
investment products, etc. among other offerings. The bank also cross-
sells asset products to its existing customers through personal loans,
auto loans, business loans, etc., and to do so they use various
communication methods including cold calling, e-mails,
recommendations on the net banking, mobile banking, etc.

GODIGT Bank also has a set of customers who were given credit
cards based on risk policy and customer category class but due to huge
competition in the credit card market, the bank is observing high attrition
in credit card spending. The bank makes money only if customers
spend more on credit cards. Given the attrition, the Bank wants to revisit
its credit card policy and make sure that the card given to the customer
is the right credit card. The bank will make a profit only through the
customers that show higher intent towards a recommended credit card.
(Higher intent means consumers would want to use the card and hence
not be attrite.)

Question: Analyze the dataset and list down the top 5 important
variables, along with the business justifications. (10 Points) Data
Dictionary - Link

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


29

Data dictionary for second dataset

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


30

Company Profile: GODIGT Bank

Details = Mid-sized private bank, that deals with Banking products

Products/ Services offered = Bank accounts, and Loans

Vision = Since the bank offers credit cards, it wishes to generate more profit by
increasing the customers spend on the credit cards. Currently, the bank is seeing
high attrition rate, and would like to revise their Credit card policies, to decrease
the attrition rate

Dataset = Data set of users who are using Credit cards along with their card
details, activity details, average spend, annual income, transactional activity
status and more.

Objective: Use Analytics to identify 5 most important variables that can


help in identifying and increasing the Credit card spend and decreasing
the attrition rate

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


31

Solution:
Given dataset has 8448 entries and 28 columns

19 numerical columns (integers), 1 date time and 8 categorical variables

Checking for duplicates, there are no duplicates in the dataset

Only Transaction revolver column has 38 missing values

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


32

Before and After

We will treat the missing values in this column by replacing them with
the mode of given column “T”

Rena
261 values in column Occupation_at_source are assigned “0”. Removing
those as the average spend for these values is very high, and we cannot
attribute them to any profession.

Assuming the values are unknown, we are not discarding from the data for
now.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


33

Let’s see the preliminary statistical analysis of the data

Now reading the distribution of Categorical variables:

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


34

Identifying and treating Outliers when needed:

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


35

Based on the above data, we have outliers in cc limit,


annual_income_at_source and average_spends_l3m. Lets look at
the distribution of data for these 3 columns.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


36

There can be premium customers that belong to high income


groups, and similarly there can be customers who have high spends,
which would imply their average spend in the last 3 months is high.
Hence we should not treat all of them, and only remove outliers that
belong to lower Net worth High spend group, i.e. E category. There are no
rows in the given dataset with above conditions. For ease, lets remove all
outliers from average_spend_last3months data/

Average spend Upper and Lower limit using IQR rule are:

Final resultant dataset after removing outliers is below:

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


37

Now treating the Credit card limit similarly to remove its outliers:

Final resultant dataset after removing outliers is below:

Thus we have removed the invalid outliers from the dataset.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


38

Now we proceed to do encoding in order to use certain variables for


computing their values on similar grounds.

To begin with we have net_worth grade between A to E, with A being elite.

Creating a column to assign high_networth grades A,B,C,D,E to numeric


values 5,4,3,2,1, respectively

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


39

To begin we have to consider the following columns in contention to find


the top 5 important variables.

i. Grade_value (high_networth)
ii. active_30
iii. active_60
iv. active_90
v. Cc_active_30
vi. Cc_active_60
vii. Cc_active_90
viii. engagement_products
ix. annual_income_at_source
x. other_bank_cc_holding
xi. bank_vintage
xii. hotlist_flag
xiii. widget_products
xiv. cc_limit
xv. average_spend_l3m
xvi. T+1_month_activity
xvii. T+2_month_activity
xviii. T+3_month_activity
xix. T+6_month_activity
xx. T+12_month_activity

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


40

Plotting a correlation heatmap for the above variables, we find

When looking for Variables that help increase customer credit card
activity, we want to maximize cc_Active score, along with T+X month
activity as well as average credit card spends in the last 3 months.

For the above variables , the dependent variables to be considered are


primarily hotlist flag, high_networth, bank vintage, other bank cc holding,
and annual income at source

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


41

We see a partial to strong correlation between the last 3 month spends


through credit card with the Customer networth grade value (0.56) and
Annual income (0.68).

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


42

The above figure between credit card activity and other transaction activity
shows that there is no major impact of transactional activities on credit card
activities in any time frame.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


43

Average spend also has a strong correlation with credit card limit
offered to the customers.

Mean of average credit card expenditure based on with and without


holding a credit card in another bank:

Median of average credit card expenditure based on with and


without holding a credit card in another bank:

There is no significance difference, hence holding another banks credit


card is not affecting the average spend that much

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


44

Mean of average credit card expenditure based on Transacter revolver


category:

Median of average credit card expenditure based on Transacter revolver


category:

It is very clear that Users belonging to revolver category, who like to carry
balances over from one month to the next, have more expenditure

Also, we can establish that Salaried, Slef employed and Students use
credit card more compared to others, and this Occupation becomes and
important variable for deciding credit card usage.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


45

Cards that are hotlisted do not contribute to any cred card activities.

Last 12 month active users based on Hotlisting

It is critical to identify and eliminate Hotlisted bots while considering the


Credit card attrition

There is no strong trend with the credit car sourcing date, implying life of credit
card has no significant role in card spend.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


46

The type of card being used shows significance variation in the


average expenditure.

This is an important variable to be considered for deciding important credit


card users.

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed


47

Summary:

Thus we can conclude the most important 5 variables in the dataset to


increase the credit card access, and decrease the CC attrition rate are:
1. annual_income_at_source
2. Transactor_revolver
3. high_networth
4. Occupation_at_source
5. card_type
6. hotlist_flag

GODIGT Bank should focus on these variables in order to improve their


Credit card usage, to revise their Credit card policies, and to decrease the
attrition rate

Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed

You might also like