You are on page 1of 16

A.

What is the important technical information about the dataset that a database
administrator would be interested in? (Hint: Information about the size of the
dataset and the nature of the variables)

Solution:

Dataframe provided: Austo motor company


Industry: Automobile
Total rows: 1581
Total columns: 14
Dtypes: float64(1), int64(5), object (8)

Null Values:
 There are 53 null values in column ‘Gender’ and 106 in ‘Partner_salary’
B. Take a critical look at the data and do a preliminary analysis of the variables. Do
a quality check of the data so that the variables are consistent. Are there any
discrepancies present in the data? If yes, perform preliminary treatment of data.

Solution:
 Statistical analysis of the data are given below

 As mentioned earlier, preliminary analysis also called out the null values and below are
for your reference
 Gender column has 53 null values
 Partner_salary column has 106 null values
 Upon further analysis of the variables to assess the auality of the data, we can conclude
that there are no duplicate values.
 Meanwhile, we have identified additional unique entries in “Gender” column and below
is the snapshot for your reference:

 As you can see, there were two spelling errors in the Gender column along with 53 null
values. Firstly, we have corrected spelling errors and below is the snapshot for your
reference:

 Next step is to replace Null values in the Gender column, with the respective mode
value:
 Next step is to correct the errors in ‘Partner_salary’ column, as it has 106 null values.
 Replaced null values in ‘Partner_salary’ with below formula:
 Partner_salary=Total_salary-Salary
 Below is the snapshot of the variables with accurate info:

 Now that the data is cleaned up, next step is to check outliers as per below boxplots:

 From the boxplots above, there are outliers in “Total_salary” and “No_of_Dependents”.
 While “No_of_Dependents” can be 0, hence we will focus on correcting the outliers in
“Total_Salary” column by taking mean to avoid any future errors.
 Mean of “Total_salary” - 79625.996205
 will treat outliers by using lower (Q1-(1.5*IQR)) and upper range(Q3+(1.5*IQR)).
 Ooutliers are now treated by using the lower and upper range and same can be seen in
the plot below:
C. Explore all the features of the data separately by using appropriate visualizations and
draw insights that can be utilized by the business.

Solutions:
 Statistical analysis of the data which helps to summarize are as below:

 To further analyze the data, let us go through visualizations of different parameters.


 Analyze by Age:
 From the above plot, it is clearly defined that younger age group (20-30years) tends
to buy more cars as compared to any other.
 While there is a dip in the age group 30-60 years, we are noticing a fluctuation in
the age group 31-40 years. Sales of car is slightly better after younger age group.
 Analyze Age vs Make variables:
 From the plot above, its clearly evident that younger age group (20-30 years)
prefers Sedan and Hatchback over SUV.
 30-40 years age group prefers Sedan and SUV but not Hatchback, while the
older age group i.e., 45+ clearly prefers SUV’s.
 With the above pointers, we can easily say that, people with higher experience
prefers SUV and this can be due to multiple factors like Age, total number of
dependents and this is also evident based upon the graph below.
 Analyzing by Gender
 While total number of cars purchased by Maie is high, it is also in proportion to the
given data the interesting factor will be to analyze types of cars purchased by each
Gender and below plot will give us the required view:

 While men prefers Sedan and Hatchback, women prefers to drive SUV’s followed by
Sedan with negligible share of harchback.
 Analyzing by Marital status:

 From the plot above, Married person prefers to have a car as compared to a single
person.
 While Sedan and Hatchback drives sales among married person, SUV’s can do better
and has a scope of increasing their sales.

 Analyzing Profession vs Make variable:

 Among working professionals, salaried persons prefer to have a car as compared to


Businessmen.
 While SUV’s are least preferred among both Salaries and Business class, Sedan
tends to overshine in both the class with very high number among Salaried class.

 Analyzing Price, Total_salary and Make variable:


 From the graph above, while Sedan and Hatchback is preferred for the majority but
the Price and Total_Salary also plays a role.
 As the Total_Salary goes up, people tend to prefer SUV.

 With above all the data we can conclude that:


 Average age of buying the car is 31.9 years
 Average price of the car is 35595.72
 Average price of the car by make is

D. Understanding the relationships among the variables in the dataset is crucial for every
analytical project. Perform analysis on the data fields to gain deeper insights. Comment on
your understanding of the data.

Solution:

 Establishing between co-relation between Age & Price

With above fig reference there is positive correlation between age of customer and amount
of moneyspent on the buying cars and as the customer age increases they tends to buy
more expensive cars. As the age of the Customer increases the amount of money spent on
this automotive sector also increases.
Age and Price are positively correlated.
 Establishing co-relation between Salary and Price

Insights for the above scatterplot reveals that as the salary of individual increases then price
of the cars is also increasing. Hence, price and salaries are positively correlated.

 Establishing co-relation between profession and Gender:

Male Business professionals first choice is Hatch back and second is Sedan and
comparatively less preferredis SUV.Whereas female Business Professions prefer to buy Sedan
as well as SUV with similar interest in Make.Salaried Female customer first choice is SUV
whereas second choice is Sedan and with fewer sales ofhatchback amongst them.Salaried
Male Customer first choice is either Sedan or Hatchback as compared to SUV , SUV
iscomparatively less demanding amongst them.

 Establishing co-relation with heatmap with Age, Price, No-of_Dependents, Total_salary


 Insights from the above heatmap:
 1 = Perfect co-relation
 -1 to 0 = Negative co-relation
 0 to 1 = Positive co-relation i.e., there is a strong corelation between age and
price; there is a positive co-relation between Total_salary and price

E. Employees working on the existing marketing campaign have made the following remarks.
Based on the data and your analysis state whether you agree or disagree with their
observations. Justify your answer Based on the data available.

E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”

Solution:

 No, basis the plot above Women prefers SUV more compared to Men.

E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
Solutions:

 Yes, Salaried class is more likely to buy a Sedan

E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier
target for a SUV sale over a Sedan Sale.

Solutions:

 No, the given statement is wrong. Salaried male prefers Sedan over SUV.

A.
F.From the given data, comment on the amount spent on purchasing automobiles across the
followingcategories. Comment on how a business can utilize the results from this exercise.
Give justification alongwith presenting metrics/charts used for arriving at the conclusions.

1. Gender: Total purchased as per Gender is given below

Average price spent by Gender:

Based upon the table above, it is clearly defined that women has bought more expensive
cars.

2. Personal loan: Let us go through the plot to analyze automobile purchase among people
with their personal loan status

Basis the graph above, we can say that maximum number of cars is purchased by people
taking personal loan and Sedan is a preferred car, irrespective of their personal loan status
followed by Hatchback and then SUV.
From the above observation, people without personal loan has spent more to buy more
expensive cars.

G. From the current data set comment if having a working partner leads to the purchase of
ahigher-priced car.

Solution:

No, from the table above it is not a right statement. this clearly indicates that individual with
their Partner_working status as “No” purchases more expensive car.

H. The main objective of this analysis is to devise an improved marketing strategy to


sendtargeted information to different groups of potential buyers present in the data. For
thecurrent analysis use the Gender and Marital_status - fields to arrive at groups with
similar purchase history.

Solution:
 As per below information with Gender aspect , we can conclude that total Male
customer buysmore cars with highest number of Hatchback followed by Sedan on
second number and SUV takes third positionin buying preference.
 For Female customer they buys more SUV as compared to Sedan, Hatchback takes last
position in thebuying preference list for Females.

 From the table below, zero Hatchbacksales amongst female with business
professionals, these customer first preference is tonbuy SUV and second choice
is Sedan.
 Whereas Salaried females first choice is SUV followed by Sedan with few females prefer
to buy Hatchback.
 Male Business professionals prefers to buy Hatchback followed by Sedan and
fewer choices of SUV.
 Salaried Male prefer to buy Sedan followed by Hatchback and SUV takes third position
for the choice.
 Among Married customers buys more Sedan as compared to Hatchback and SUV
becomes the last choice.
 Single customer buys more Hatchback as compared to Sedan and SUV takes last
position for the choice

 from the table above, There are total 1443 married and 138 Singles, there are more
married customers in the company record
 Married business professionals prefers to buy Sedan followed by Hatchback and SUV.
 Single business professionals tends to buy more hatchback than Sedan and SUV. Also, as
we saw earlier there is a difference of choice between married and single as well,.

You might also like