Professional Documents
Culture Documents
PROJECT- SMDM
DSBA-SMDM Module
0
1
Contents
Faizan Ali Sayyed ................................................................................................................................ 0
Problem 1 ........................................................................................................................................ 2
Problem 2 ...................................................................................................................................... 28
Problem 1
Analysts are required to explore data and reflect on the insights.
Clear writing skill is an integral part of a good report. Note that the
explanations must be such that readers with minimum knowledge of
analytics is able to grasp the insight.
Dataset = Customer data with the model, financial ability of the household,
dependencies, demographics, and price of the model.
There are 53 rows with missing values in column Gender, and 106 missing
values for Partner salary (Note that Partner salary can be 0 if a partner is not
working, hence missing values are a concern)
Solution:
We already know, there are missing values in two variables Gender
and Partner Salary
Based on the unique columns value count, there are typing errors in Gender
column
Since there are two spelling errors ‘Femal & Femle’ , we will club then under
the “Female” section.
Checking the final output
Before
After
Identifying an treating outliers in the numerical data with help of IQR rule.
From the boxplot charts, we can see that there are outliers in the no
of dependents, and Total Salary data.
It is likely that there are 0 dependents in a family, therefore it’s okay to not
treat it, and accept the outliers as valid in the data.
Now we will treat the invalid outliers in the Total Salary column, as
they can skew the data.Mean of the Total_salary is79625.99 i.e. 79626 units
and the Median is 78000 units
IQR = Q3-Q1
= 75% -25%
= 95900 – 60500
= 35400 units
Great Learning- DSBA- SMDM Project- Faizan Ali Sayyed
12
Similarly, we calculate Upper and Lower limit, and then treat the outliers on
the higher side by dropping those rows
Key Takeaways:
Sedan aquires 45% of total models sold, and generates maximum
44% revenue.
Although SUVs are sold just 17%, less than half of Hatchback, they
generate almost similar revenue as the later.
Average age for buying cars is 29.
Average price to buy car is 35597.72 units
Customer and Total Salaries are normally distributed, which
means Users from all Financial background in the dataset
Based on Age distribution, there are more Young customers than
old
More customer tends to buy less expensive cars, with a small peak
at later stage indicated another set of audience buying expensive
cars.
Female customers tend to buy SUVs more, while Male customers
are very likely to buy Sedan and Hatchback
Most of the customers are married than single. Single customers
tend to buy Hatchback, but overall Married customers tend to buy
SUVs, followed by Sedan
Sedan is best seller among both segments of Salaried and
Business class. Hatchback is a close second for both, followed by
SUVs.
Post Graduate students prefer Sedan and Hatchback over SUV
cars, and Graduates follow a similar trend
SUV is the premium car, while Sedan is economic and Hatchback
is budget.
Customers with and without Personal loan buys more Sedan and
Hatchback as compared to SUV. Customers with and without
House loan buys more Sedan and Hatchback with lesser
preference given to SUV. This creates a hypothesis on whether
loan of any kind doesn’t affect the choice of car a customer wants
to buy.
Based on number of units sold, Sedan is more manufactured,
Based on the pair plot, and correlation heatmap above, we can state that:
A salaried Male is more likely to buy a Sedan than an SUV, hence the
remark on salaried male being an easier target for SUV is not true.
Sheldon Cooper is surprisingly incorrect this time.
Solution:
F1) For Gender amount spent on purchasing automobiles by Male and
Female is as follows:
Females have bought far less cars but their net spend is much higher
than male customers
The above chart confirms the same hypothesis as women tend to buy
more expensive cars than men
Customers with and without the personal loan have spent similar
aggregated amounts on car; Not taking loan has slightly higher spend
compared to with loan, but more customers have taken personal loan.
This implies spend by people who don’t take loan is relatively better.
In order to boost sales, Austo Motors can target female customers, and
offer better loan offers, lesser interest rates.
The trends in purchase patterns are also similar for people with and without
working partners.
Married customers buy much more cars than single customers, also
they tend to opt for Sedan and SUV.
Older females buy SUVs- a campaign could be targeted at them.
Hatchback can be seen as a young male’s adrenaline rush- go to
model. As it is bought mostly by Male under 30
Sedan is everyone’s preferred vehicle and a cash cow, it can be
utilized and leveraged to generate more revenue and gain more
traction.
Problem 2
A bank can generate revenue in a variety of ways, such as
charging interest, transaction fees and financial advice. Interest
charged on the capital that the bank lends out to customers has
historically been the most significant method of revenue generation.
The bank earns profits from the difference between the interest rates it
pays on deposits and other sources of funds, and the interest rates it
charges on the loans it gives out.
GODIGT Bank also has a set of customers who were given credit
cards based on risk policy and customer category class but due to huge
competition in the credit card market, the bank is observing high attrition
in credit card spending. The bank makes money only if customers
spend more on credit cards. Given the attrition, the Bank wants to revisit
its credit card policy and make sure that the card given to the customer
is the right credit card. The bank will make a profit only through the
customers that show higher intent towards a recommended credit card.
(Higher intent means consumers would want to use the card and hence
not be attrite.)
Question: Analyze the dataset and list down the top 5 important
variables, along with the business justifications. (10 Points) Data
Dictionary - Link
Vision = Since the bank offers credit cards, it wishes to generate more profit by
increasing the customers spend on the credit cards. Currently, the bank is seeing
high attrition rate, and would like to revise their Credit card policies, to decrease
the attrition rate
Dataset = Data set of users who are using Credit cards along with their card
details, activity details, average spend, annual income, transactional activity
status and more.
Solution:
Given dataset has 8448 entries and 28 columns
We will treat the missing values in this column by replacing them with
the mode of given column “T”
Rena
261 values in column Occupation_at_source are assigned “0”. Removing
those as the average spend for these values is very high, and we cannot
attribute them to any profession.
Assuming the values are unknown, we are not discarding from the data for
now.
Average spend Upper and Lower limit using IQR rule are:
Now treating the Credit card limit similarly to remove its outliers:
i. Grade_value (high_networth)
ii. active_30
iii. active_60
iv. active_90
v. Cc_active_30
vi. Cc_active_60
vii. Cc_active_90
viii. engagement_products
ix. annual_income_at_source
x. other_bank_cc_holding
xi. bank_vintage
xii. hotlist_flag
xiii. widget_products
xiv. cc_limit
xv. average_spend_l3m
xvi. T+1_month_activity
xvii. T+2_month_activity
xviii. T+3_month_activity
xix. T+6_month_activity
xx. T+12_month_activity
When looking for Variables that help increase customer credit card
activity, we want to maximize cc_Active score, along with T+X month
activity as well as average credit card spends in the last 3 months.
The above figure between credit card activity and other transaction activity
shows that there is no major impact of transactional activities on credit card
activities in any time frame.
Average spend also has a strong correlation with credit card limit
offered to the customers.
It is very clear that Users belonging to revolver category, who like to carry
balances over from one month to the next, have more expenditure
Also, we can establish that Salaried, Slef employed and Students use
credit card more compared to others, and this Occupation becomes and
important variable for deciding credit card usage.
Cards that are hotlisted do not contribute to any cred card activities.
There is no strong trend with the credit car sourcing date, implying life of credit
card has no significant role in card spend.
Summary: