You are on page 1of 29

Prepared by KARTHIKEYAN M as a part of & Retail analytics

Milestone 1
Agenda:
The project is find the different segment of customers based on the buying patterns of customers of an
automobile manufacturer based on the past 3 years transaction data of the company. There by
providing the recommendation for customizing the marketing strategies based on different segments
of customers.

Executive Summary of the data:


Time period of the data is from Jan 2018 till May 2020 that is 29 month. There are about 2747 entries
with 20 different variables detailing the demography of the product and customer information.

Tools Used:
Python , Tableau, Knime
 Problem Statement
 Data Summary
 Exploratory Analysis
 Univariate analysis
 Bivariate Analysis
 Multivariate Analysis
 Time Series & Trends

 Inferences
 Customer Segmentation using RFM
 Tools used & Assumptions

 KNIME Workflow & Data Output


 Inference From RFM Analysis
 Recommendations
An automobile parts manufacturing company has collected data of transactions for 3 years. They do not
have any in-house data science team, thus they have hired you as their consultant. Your job is to use
your magical data science skills to provide them with suitable insights about their data and their
customers.
 Shape of Data
 Total number of entries – 2747
 Total number of Columns – 20

 The details of 20 variables as follows,

 There are about 7 measure variables and one date variable


 Other 12 variables are dimensions
 8 Dimension variable is about the demographics of the
customers
 Data has the number of orders , total sales
 There are no null values in any of the columns
 Details of data,

 Product minimum selling price (MSRP) ranges from 33 to 214


 Sales per item ranges from ranges from 482.13 to 14082.8
 Date since recent order ranges from 42 to 3562 days
 There are no duplicate values in the record.
 Quantity ordered, the second row in the graph is not normally distributed, it is
skewed

 Sales data is also skewed towards left

 The following are the different values for each of the column,
 Country -19
 City – 71
 Customers -89
 Status – 6
 Product line – 7
• Sales by city shows San rafel, NYC from US, Madrid
• Sales by country shows, US , SPAIN and FRANCE
from Spain , Paris from France shows the maximum
are top countries by sales
sales
• Medium deal size has maximum sales comparted • Classic cars and vintage cars are having higher
with large and small sales compared with other product line
 Orders by counts by city and country as below.
 US, Spain, France has the highest count by orders
 Sales by city shows San rafel, NYC from US, Madrid from Spain , Paris from France shows the
maximum sales

• Medium deal size has maximum order counts • Classic cars and vintage cars are having higher
followed by small deal size comparted with large order counts compared with other product line
and small
• Top 10 customer by sales

• Top 10 customer by order counts


• Year wise sales across product line • Year wise order counts across product line
• Correlation between variables
• Country, product line vs sales
 Sales by Year & month Sales by quarter and month
 Sales by Week

 Trends shows, Quarter 4 especially November month (Week 1 and Week 2) have maximum sales
 Order count by month.
 Classic cars product line are more popular year on year both by sales and order quantity whereas
trains are the lowest
 Classic cars across countries post highest sales
 Medium deal size provided higher sales than large and small deal size
 USA has the highest sales and order quantity followed by Spain and France
 Madrid has the highest sales by city
 Large size deals are constant over the years
 Major sales in by only top customers hence the company is more depend on few customers
Tool Used:
 KNIME tool is used here & Tableau

Data & Assumptions:


 Columns considered for RFM analysis are order number, order date, sales

 Days since last order is ignored and 1-06-2020 has been assumed as fixed date for calculation purpose

 Difference between1-6-2020 and order date has been used for calculating the recency

 Sum sales made is used for calculating the monetary value

 Count of order number is used for calculating the frequency

 All data has been summarized by customer name

 Data with cancelled has been filtered as it may not add value to RFM

 Created 3 bins for each of


 recency, (0-25 percentile as low, 25th to 75th percentile as Medium and 75th percentile to 100th percentile as high)
 monetary, (0-25 percentile as High, 25th to 75th percentile as Medium and 75th percentile to 100th percentile as low)
 frequency (0-25 percentile as High, 25th to 75th percentile as Medium and 75th percentile to 100th percentile as low)
 Best Customers with their sales value

 These are customers most recent with high sales and high frequency of ordering
 Lost Customers with their sales value

 These are customers who have not ordered recently with low sales and low frequency of ordering
 Loyal Customers with their sales value

 These are customers who have not ordered recently however they have purchased frequently with
good monetary value as they were loyal
 Customers on the verge of churning with their sales

 These are customers with low or medium frequency & monetary value and low recency.
 Best customers – Inorder to maintain them specific customer attentions emailer update like the new
product, product updates to be sent to them. ( Low Marketing budget)
 Loyal customers to be given more attention by regular emailers, regular event updates and loyalty
programs to make them best customers (Medium marketing budget)
 On verge of churn customers can be given more offers like free delivery discounts to retain the
customers (High Marketing budget)
 Lost customers needs to be discarded and no promotional expenses to be spent

Tableau Link :
https://public.tableau.com/views/MRA_16368842914850/SalesbyCustomer?:language=en-
US&publish=yes&:display_count=n&:origin=viz_share_link

You might also like