Professional Documents
Culture Documents
PROJECT
By - Parijat Dev
Problem 1
Q-What is the important technical information about the dataset that a database
administrator would be interested in? (Hint: Information about the size of the
dataset and the nature of the variables).
The dataset is about a car manufacturer Austo Motor Company, it has 1581 rows and 14 columns.
There are missing values in the 'Gender' and 'Partner_salary' columns. The data types are int64,
float64, and object. The columns contain information about the customer's age, gender, profession,
marital status, education, number of dependents, personal and house loans, partner working status,
salary, partner salary, total salary, price, and make.
The minimum average age is 22 years and maximum average age is 54 and more than 50% of the
population have age less than 29. The average age is 32 years THe number of dependents of
buyers range from 0 to 4 with an average of 2 depedents The minimum salary drawn by the buyers
are 30000 to a max of 99300 and 50% of the population has The average age of individuals in the
dataset is 31.9 years old, with a standard deviation of 8.4. The youngest individual in the dataset
is 22 years old, while the oldest is 54 years old.
The average number of dependents is 2.5, with a standard deviation of 0.9. The minimum number
of dependents is 0, while the maximum is 4.
The average salary of individuals in the dataset is 60,392, with a standard deviation of 14,674. The
minimum salary is 30,000, while the maximum salary is 99,300.
The average partner salary is 20,225, with a standard deviation of 19,573. However, it should be
noted that this column has some missing values (106 entries), as indicated by the difference in
the count between this column and the others.
The average total salary (which includes the individual's salary and their partner's salary, if
applicable) is 79,626, with a standard deviation of 25,545. The minimum total salary is 30,000,
while the maximum total salary is 171,000.
The average price of the car purchased by individuals in the dataset is 35,598, with a standard
deviation of 13,634. The cheapest car purchased in the dataset costs 18,000, while the most
expensive car costs 70,000.
We will check the unique values across the columns to see if there are any bad values in the data
Age 53, 52, 50, 49, 47, 46, 45, 51, 54, 48, 44, 43, 42, 41, 40, 39, 38, 37, 36,
35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22
Gender Male, Femal, Female, nan, Femle
No_of_Dependents 4, 3, 2, 1, 0
Gender has misspelling of Female and we will replace these wrong spelling with correct spellings.
Gender columns has also missing values, there are total 53 missing values in the gender column, to
fill these missing value we will first find the ratio of the male vs female in the dataset and in the same
ratio we will impute these values.
Partner Salary also has 106 missing value, we can use simple calculation of Total
salary-Salary=Partner Salary. Based on this calculation we can impute the missing values in partner
salary.
Q- Explore all the features of the data separately by using appropriate
visualizations and draw insights that can be utilized by the business.
The younger generation is the major customer segment of the austo motor company. A large set of
customers belong to the age group of 22 to 30 year.
As it is clear from the above graph that the number of male customers is significantly higher than
ther number of female customer
The average salary of individuals in the dataset is 60,392, with a standard deviation of 14,674. The
minimum salary is 30,000, while the maximum salary is 99,300. There are no outliers in the salary.
The number of individuals who are married and bought the car is significantly higher that the number
of individuals who are single.
The company is making cars in three segments that are SUV’s, Sedan and Hatchback. Sedan cars
are the most sold cars followed by hatchback and SUV.
50% of the customers have availed a personal loan and the rest 50% have not. Whereas in case of
the house loan there are almost double the person who have not take any house loan.
Among the customer who have bought car most of them either 2 or 3 dependents.
Most of the people in the dataset belong to the salaried profession rather than business but the
difference is not too large.
Q- Understanding the relationships among the variables in the dataset is crucial
for every analytical project. Perform analysis on the data fields to gain deeper
insights. Comment on your understanding of the data.
Salaried women have bought twice the amount of cars bought by business womens. Whereas there
is not much difference in the cars bought by salaried men and businessmen. Although salaried men
have bought more cars than the businessmen.
Total salary and price of the cars bought by the customers shows the positive correlation.
Salary slab of 50,000 to 80,000 has most number of customers in both the genders.
Men prefer Hatchbacks and Sedan as their top preference whereas women prefer SUV and sedan
as their top preference.
Salaried and Business person both prefer sedan over other make of the cars. Salaried and business
professional has the same type of preference when it comes to selecting the car type.
Married people give preference to sedan whereas single people give preference to hatchback cars
Singles with No dependable prefer hatchback
***E1) Steve Roger says “Men prefer SUV by a large margin, compared to the
women”
This statement is false as we can see in the below graph “Type of Cars bought by Gender” that
female prefer more SUV than men but not with a huge margin
***E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
***E3) Sheldon Cooper does not believe any of them; he claims that a salaried
male is an easier target for a SUV sale over a Sedan Sale.
The statement is incorrect, as we can see in the above graph that the salaried men prefer Sedan
over the SUV.
Q-From the given data, comment on the amount spent on purchasing
automobiles across the following categories. Comment on how a Business can
utilize the results from this exercise. Give justification along with presenting
metrics/charts used for arriving at the conclusions. Give justification along with
presenting metrics/charts used for arriving at the conclusions.
***F1) Gender
From the table below it is evident that the male have spent more amount in buying the cars than the
female, females leads the SUV segment by most spending whereas Males leads in the other two
segment of the cars by a huge margin. Males have spent most money on the Sedan
Female Male
***F2) Personal_loan
we can see that for all three categories of automobiles (Hatchback, SUV, and Sedan), the amount
spent by customers who do not have a personal loan is higher than those who have a personal loan.
This suggests that personal loan is not a major factor in determining the amount spent on
purchasing automobiles.
A business can utilize this information to better understand the spending behavior of its customers
and develop targeted marketing strategies to increase sales. For example, they can focus on
promoting financing options or incentives for customers without a personal loan to encourage them
to spend more on automobile purchases.
Q- From the current data set comment if having a working partner leads to
purchase of a higher priced car.
Having a working partner does not make the customer buy a high priced car. We can see it clearly
from the below chart and boxplot.
Working Partner (No) Working Partner (Yes)
Hatchback 14 1 484 83
1. The majority of the buyers for hatchbacks and sedans are married males, whereas for SUVs,
the majority of the buyers are married females. Therefore, different marketing strategies can
be adopted for targeting these different groups of buyers. For example, for hatchbacks and
sedans, marketing campaigns could focus on targeting married males, whereas for SUVs,
the focus could be on married females.
2. Single females and males have a lower purchase rate compared to their married
counterparts. Therefore, marketing campaigns could focus on promoting the benefits of
owning a car and how it can improve the lifestyle of single individuals.
3. The purchase rate for SUVs among married females is higher compared to other categories.
Therefore, special offers and discounts can be provided to attract more married females
towards purchasing SUVs.
4. The purchase rate for sedans among married males is higher compared to other categories.
Therefore, marketing campaigns can be tailored to promote the benefits of owning a sedan,
such as its spaciousness and comfortable ride, to attract more married males towards
purchasing sedans.
Overall, by targeting different groups of potential buyers with specific marketing strategies, a
Problem 2
GODIGT Bank is a mid-sized private bank that deals in all kinds of banking products,
such as savings accounts, current accounts, investment products, etc. among other
offerings. The bank also cross-sells asset products to its existing customers through
personal loans, auto loans, business loans, etc., and to do so they use various
communication methods including cold calling, e-mails, recommendations on the net
banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards based on risk
policy and customer category class but due to huge competition in the credit card
market, the bank is observing high attrition in credit card spending. The bank makes
money only if customers spend more on credit cards. Given the attrition, the Bank wants
to revisit its credit card policy and make sure that the card given to the customer is the
right credit card. The bank will make a profit only through the customers that show
higher intent towards a recommended credit card. (Higher intent means consumers
would want to use the card and hence not be attrite.)
Q- Analyze the dataset and list down the top 5 important variables, along with the
business justifications. (10 Points) Data Dictionary - Link )
1. Annual income at source: This variable can help in targeting customers who are likely to have
a higher spending capacity and are, therefore, more likely to be interested in higher-end
products or services.
2. Engagement products: This variable can help in identifying customers who are likely to be
interested in cross-selling opportunities for investment or loan products.
3. Occupation at source: This variable can help in targeting customers based on their
profession or income stream. For example, targeting self-employed individuals or business
owners with specific products or services.
4. Transactor/Revolver: This variable can help in identifying customers who are likely to carry
balances over from one month to the next and target them with promotions related to
balance transfer or low-interest rates.
5. CC limit: This variable can help in identifying customers who have a higher credit limit and
are, therefore, more likely to be interested in higher-end products or services.