You are on page 1of 17

DSBA SMDM

PROJECT
By - Parijat Dev
Problem 1

Q-What is the important technical information about the dataset that a database
administrator would be interested in? (Hint: Information about the size of the
dataset and the nature of the variables).

The dataset is about a car manufacturer Austo Motor Company, it has 1581 rows and 14 columns.
There are missing values in the 'Gender' and 'Partner_salary' columns. The data types are int64,
float64, and object. The columns contain information about the customer's age, gender, profession,
marital status, education, number of dependents, personal and house loans, partner working status,
salary, partner salary, total salary, price, and make.

It has dtypes: float64(1), int64(5), object(8)

Column Non Null Count Dtype

Age 1581 int64

Gender 1528 object

Profession 1581 object

Marital_Status 1581 object

Education 1581 object

No. of Dependents 1581 int64

Personal_Loan 1581 object

House_Loan 1581 object

Partner_Working 1581 object

Salary 1581 int64

Partner_Salary 1475 float64

Total_Salary 1581 int64

Price 1581 int64

Make 1581 object


Q-Take a critical look at the data and do a preliminary analysis of the variables.
Do a quality check of the data so that the variables are consistent. Are there any
discrepancies present in the data? If yes, perform preliminary treatment of data.

The minimum average age is 22 years and maximum average age is 54 and more than 50% of the
population have age less than 29. The average age is 32 years THe number of dependents of
buyers range from 0 to 4 with an average of 2 depedents The minimum salary drawn by the buyers
are 30000 to a max of 99300 and 50% of the population has The average age of individuals in the
dataset is 31.9 years old, with a standard deviation of 8.4. The youngest individual in the dataset
is 22 years old, while the oldest is 54 years old.

The average number of dependents is 2.5, with a standard deviation of 0.9. The minimum number
of dependents is 0, while the maximum is 4.

The average salary of individuals in the dataset is 60,392, with a standard deviation of 14,674. The
minimum salary is 30,000, while the maximum salary is 99,300.

The average partner salary is 20,225, with a standard deviation of 19,573. However, it should be
noted that this column has some missing values (106 entries), as indicated by the difference in
the count between this column and the others.

The average total salary (which includes the individual's salary and their partner's salary, if
applicable) is 79,626, with a standard deviation of 25,545. The minimum total salary is 30,000,
while the maximum total salary is 171,000.

The average price of the car purchased by individuals in the dataset is 35,598, with a standard
deviation of 13,634. The cheapest car purchased in the dataset costs 18,000, while the most
expensive car costs 70,000.

We will check the unique values across the columns to see if there are any bad values in the data

Columns Unique Value

Age 53, 52, 50, 49, 47, 46, 45, 51, 54, 48, 44, 43, 42, 41, 40, 39, 38, 37, 36,
35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22
Gender Male, Femal, Female, nan, Femle

Profession Business, Salaried

Marital_status Married, Single

Education Post Graduate, Graduate

No_of_Dependents 4, 3, 2, 1, 0

Personal_loan No, Yes

House_loan No, Yes

Partner_working No, Yes

Make SUV, Sedan, Hatchback

Gender has misspelling of Female and we will replace these wrong spelling with correct spellings.
Gender columns has also missing values, there are total 53 missing values in the gender column, to
fill these missing value we will first find the ratio of the male vs female in the dataset and in the same
ratio we will impute these values.

Partner Salary also has 106 missing value, we can use simple calculation of Total
salary-Salary=Partner Salary. Based on this calculation we can impute the missing values in partner
salary.
Q- Explore all the features of the data separately by using appropriate
visualizations and draw insights that can be utilized by the business.

The younger generation is the major customer segment of the austo motor company. A large set of
customers belong to the age group of 22 to 30 year.

As it is clear from the above graph that the number of male customers is significantly higher than
ther number of female customer
The average salary of individuals in the dataset is 60,392, with a standard deviation of 14,674. The
minimum salary is 30,000, while the maximum salary is 99,300. There are no outliers in the salary.

The number of individuals who are married and bought the car is significantly higher that the number
of individuals who are single.
The company is making cars in three segments that are SUV’s, Sedan and Hatchback. Sedan cars
are the most sold cars followed by hatchback and SUV.

50% of the customers have availed a personal loan and the rest 50% have not. Whereas in case of
the house loan there are almost double the person who have not take any house loan.
Among the customer who have bought car most of them either 2 or 3 dependents.

Most of the people in the dataset belong to the salaried profession rather than business but the
difference is not too large.
Q- Understanding the relationships among the variables in the dataset is crucial
for every analytical project. Perform analysis on the data fields to gain deeper
insights. Comment on your understanding of the data.

Salaried women have bought twice the amount of cars bought by business womens. Whereas there
is not much difference in the cars bought by salaried men and businessmen. Although salaried men
have bought more cars than the businessmen.

Total salary and price of the cars bought by the customers shows the positive correlation.
Salary slab of 50,000 to 80,000 has most number of customers in both the genders.

Men prefer Hatchbacks and Sedan as their top preference whereas women prefer SUV and sedan
as their top preference.
Salaried and Business person both prefer sedan over other make of the cars. Salaried and business
professional has the same type of preference when it comes to selecting the car type.

Married people give preference to sedan whereas single people give preference to hatchback cars
Singles with No dependable prefer hatchback

Singles with 1 dependable prefer sedan

Singles with 2 dependable prefer hatchback

Married with 1 dependable prefer Sedan

Married with 2 dependable also prefer sedan

Married with 3 dependables prefer hatchback

Married with 4 dependables prefer hatchback

Q- Employees working on the existing marketing campaign have made the


following remarks. Based on the data and your analysis state whether you agree
or disagree with their observations. Justify your answer Based on the data
available.

***E1) Steve Roger says “Men prefer SUV by a large margin, compared to the
women”

This statement is false as we can see in the below graph “Type of Cars bought by Gender” that
female prefer more SUV than men but not with a huge margin
***E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.

The statement is correct as per the below graph

***E3) Sheldon Cooper does not believe any of them; he claims that a salaried
male is an easier target for a SUV sale over a Sedan Sale.

The statement is incorrect, as we can see in the above graph that the salaried men prefer Sedan
over the SUV.
Q-From the given data, comment on the amount spent on purchasing
automobiles across the following categories. Comment on how a Business can
utilize the results from this exercise. Give justification along with presenting
metrics/charts used for arriving at the conclusions. Give justification along with
presenting metrics/charts used for arriving at the conclusions.

***F1) Gender

From the table below it is evident that the male have spent more amount in buying the cars than the
female, females leads the SUV segment by most spending whereas Males leads in the other two
segment of the cars by a huge margin. Males have spent most money on the Sedan

Female Male

Hatchback 412000 14996000

SUV 9301000 7279000

Sedan 6226000 18066000

***F2) Personal_loan

we can see that for all three categories of automobiles (Hatchback, SUV, and Sedan), the amount
spent by customers who do not have a personal loan is higher than those who have a personal loan.
This suggests that personal loan is not a major factor in determining the amount spent on
purchasing automobiles.

A business can utilize this information to better understand the spending behavior of its customers
and develop targeted marketing strategies to increase sales. For example, they can focus on
promoting financing options or incentives for customers without a personal loan to encourage them
to spend more on automobile purchases.

Personal Loan (No) Personal Loan (Yes)

Hatchback 7765000 7643000

SUV 10373000 6207000

Sedan 10852000 13440000

Q- From the current data set comment if having a working partner leads to
purchase of a higher priced car.

Having a working partner does not make the customer buy a high priced car. We can see it clearly
from the below chart and boxplot.
Working Partner (No) Working Partner (Yes)

Hatchback 7397000 8011000

SUV 8089000 8491000

Sedan 10182000 14110000

Q-The main objective of this analysis is to devise an improved marketing strategy


to send targeted information to different groups of potential buyers present in the
data. For the current analysis use Gender and Marital_status - fields to arrive at
groups with similar purchase history.
Female Male

Married Single Married Single

Hatchback 14 1 484 83

SUV 167 7 114 9


Sedan 137 14 527 24

1. The majority of the buyers for hatchbacks and sedans are married males, whereas for SUVs,
the majority of the buyers are married females. Therefore, different marketing strategies can
be adopted for targeting these different groups of buyers. For example, for hatchbacks and
sedans, marketing campaigns could focus on targeting married males, whereas for SUVs,
the focus could be on married females.
2. Single females and males have a lower purchase rate compared to their married
counterparts. Therefore, marketing campaigns could focus on promoting the benefits of
owning a car and how it can improve the lifestyle of single individuals.
3. The purchase rate for SUVs among married females is higher compared to other categories.
Therefore, special offers and discounts can be provided to attract more married females
towards purchasing SUVs.
4. The purchase rate for sedans among married males is higher compared to other categories.
Therefore, marketing campaigns can be tailored to promote the benefits of owning a sedan,
such as its spaciousness and comfortable ride, to attract more married males towards
purchasing sedans.

Overall, by targeting different groups of potential buyers with specific marketing strategies, a

business can increase its sales and revenue.

Problem 2

A bank can generate revenue in a variety of ways, such as charging interest,


transaction fees and financial advice. Interest charged on the capital that the bank lends
out to customers has historically been the most significant method of revenue
generation. The bank earns profits from the difference between the interest rates it pays
on deposits and other sources of funds, and the interest rates it charges on the loans it
gives out.

GODIGT Bank is a mid-sized private bank that deals in all kinds of banking products,
such as savings accounts, current accounts, investment products, etc. among other
offerings. The bank also cross-sells asset products to its existing customers through
personal loans, auto loans, business loans, etc., and to do so they use various
communication methods including cold calling, e-mails, recommendations on the net
banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards based on risk
policy and customer category class but due to huge competition in the credit card
market, the bank is observing high attrition in credit card spending. The bank makes
money only if customers spend more on credit cards. Given the attrition, the Bank wants
to revisit its credit card policy and make sure that the card given to the customer is the
right credit card. The bank will make a profit only through the customers that show
higher intent towards a recommended credit card. (Higher intent means consumers
would want to use the card and hence not be attrite.)

Q- Analyze the dataset and list down the top 5 important variables, along with the
business justifications. (10 Points) Data Dictionary - Link )

The Top 5 Imoprtant variables could be

1. Annual income at source: This variable can help in targeting customers who are likely to have
a higher spending capacity and are, therefore, more likely to be interested in higher-end
products or services.
2. Engagement products: This variable can help in identifying customers who are likely to be
interested in cross-selling opportunities for investment or loan products.
3. Occupation at source: This variable can help in targeting customers based on their
profession or income stream. For example, targeting self-employed individuals or business
owners with specific products or services.
4. Transactor/Revolver: This variable can help in identifying customers who are likely to carry
balances over from one month to the next and target them with promotions related to
balance transfer or low-interest rates.
5. CC limit: This variable can help in identifying customers who have a higher credit limit and
are, therefore, more likely to be interested in higher-end products or services.

You might also like