Professional Documents
Culture Documents
Clustering
2. Association
3. Data Cleaning
4. Data Visualization
5. Classification
6. Machine Learning
7. Prediction
8. Neural Networks
9. Outlier Detection
10. Data Warehousing
Partitioning method: This involves dividing a data set into a group of specific clusters
for evaluation based on the criteria of each individual cluster. In this method, data points
belong to just one group or cluster.
Hierarchical method: With the hierarchical method, data points are a single cluster,
which are grouped based on similarities. These newly created clusters can then be
analyzed separately from each other.
Density-based method: A machine learning method where data points plotted together
are further analyzed, but data points by themselves are labeled “noise” and discarded.
Grid-based method: This involves dividing data into cells on a grid, which then can be
clustered by individual cells rather than by the entire database. As a result, grid-based
clustering has a fast-processing time.
Model-based method: In this method, models are created for each data cluster to locate
the best data to fit that particular model.
Examples of Clustering in Business
Clustering helps businesses manage their data more effectively. For example, retailers can
use clustering models to determine which customers buy particular products, on which days,
and with what frequency. This can help retailers target products and services to customers
in a specific demographic or region.
Clustering can help grocery stores group products by a variety of characteristics (brand, size,
cost, flavor, etc.) and better understand their sales tendencies. It can also help car insurance
companies that want to identify a set of customers who typically have high annual claims
in order to price policies more effectively. In addition, banks and financial institutions might
use clustering to better understand how customers use in-person versus virtual services to
better plan branch hours and staffing.
2. Association
Association rules are used to find correlations, or associations, between points in a data set.
3. Data Cleaning
Data cleaning is the process of preparing data to be mined.
Verifying the data: This involves checking that each data point in the data set is in the
proper format (e.g., telephone numbers, social security numbers).
Converting data types: This ensures data is uniform across the data set. For instance,
numeric variables only contain numbers, while string variables can contain letters,
numbers, and characters.
Removing irrelevant data: This clears useless or inapplicable data so full emphasis can
be placed on necessary data points.
Eliminating duplicate data points: This helps speed up the mining process by boosting
efficiency and reducing errors.
Removing errors: This eliminates typing mistakes, spelling errors, and input errors that
could negatively affect analysis outcomes.
Completing missing values: This provides an estimated value for all data and reduces
missing values, which can lead to skewed or incorrect results.
4. Data Visualization
Data visualization is the translation of data into graphic form to illustrate its meaning to
business stakeholders.
Comparison charts: Charts and tables express relationships in the data, such as monthly
product sales over a one-year period.
Maps: Data maps are used to visualize data pertaining to specific geographic locations.
Through maps, data can be used to show population density and changes; compare
populations of neighboring states, counties, and countries; detect how populations are
spread over geographic regions; and compare characteristics in one region to those in
other regions.
Heat maps: This is a popular visualization technique that represents data through
different colors and shading to indicate patterns and ranges in the data. It can be used
to track everything from a region’s temperature changes to its food and pop culture
trends.
Density plots: These visualizations track data over a period of time, creating what can
look like a mountain range. Density plots make it easy to represent occurrences of single
events over time (e.g., month, year, decade).
Histograms: These are similar to density plots but are represented by bars on a graph
instead of a linear form.
Network diagrams: These diagrams show how data points relate to each other by using
a series of lines (or links) to connect objects together.
Scatter plots: These graphs represent data point relationships on a two-variable axis.
Scatter plots can be used to compare unique variables such as a country’s life
expectancy or the amount of money spent on healthcare annually.
Word clouds: These graphics are used to highlight specific word or phrase instances
appearing in a body of text; the larger the word’s size in the cloud, the more frequent its
use.
6. Machine Learning
Machine learning is the process by which computers use algorithms to learn on their own.
An increasingly relevant part of modern technology, machine learning makes computers
“smarter” by teaching them how to perform tasks based on the data they have gathered.
What Is Machine Learning in Data Mining?
In data mining, machine learning’s applications are vast. Machine learning and data mining
fall under the umbrella of data science but aren’t interchangeable terms. For instance,
computers perform data mining as part of their machine learning functions.
Supervised learning: In this method, algorithms train machines to learn using pre-
labeled data with correct values, which the machines then classify on their own. It’s
called supervised because the process trains (or “supervises”) computers to classify
data and predict outcomes. Supervised machine learning is used in data mining
classification.
Unsupervised learning: When computers handle unlabeled data, they engage in
unsupervised learning. In this case, the computer classifies the data itself and then looks
for patterns on its own. Unsupervised models are used to perform clustering and
association.
Semi-supervised learning: Semi-supervised learning uses a combination of labeled and
unlabeled data, making it a hybrid of the above models.
Reinforcement learning: This is a more layered process in which computers learn to
make decisions based on examining data in a specific environment. For example, a
computer might learn to play chess by examining data from thousands of games played
online.
If you are a business analyst, a product manager, or someone associated with building
products, you may want to check out my latest book on reasoning by first
principles titled – First principles thinking: Building winning products using first
principles thinking. You may as well check out the related blog – First principles thinking
explained with examples.
2. Spam filter
You know the junk folder in your email inbox? It is the place where emails that have been identified as
spam by the algorithm.
Many machine learning courses, such as Andrew Ng’s famed Coursera course, use the spam filter as an
example of unsupervised learning and clustering.
What the problem is: Spam emails are at best an annoying part of modern day marketing techniques,
and at worst, an example of people phishing for your personal data. To avoid getting these emails in
your main inbox, email companies use algorithms. The purpose of these algorithms is to flag an email
as spam correctly or not.
How clustering works: K-Means clustering techniques have proven to be an effective way of identifying
spam. The way that it works is by looking at the different sections of the email (header, sender, and
content). The data is then grouped together.
These groups can then be classified to identify which are spam. Including clustering in the classification
process improves the accuracy of the filter to 97%. This is excellent news for people who want to be
sure they’re not missing out on your favorite newsletters and offers.
6. Document analysis
There are many different reasons why you would want to run an analysis on a document. In this
scenario, you want to be able to organize the documents quickly and efficiently.
What the problem is: Imagine you are limited in time and need to organize information held in
documents quickly. To be able to complete this ask you need to: understand the theme of the text,
compare it with other documents and classify it.
How clustering works: Hierarchical clustering has been used to solve this problem. The algorithm is able
to look at the text and group it into different themes. Using this technique, you can cluster and organize
similar documents quickly using the characteristics identified in the paragraph.
Real-Life Examples of Association Analysis, Clustering Analysis, Text Mining, and Web
Usage Mining
Association Analysis is one of the data mining techniques that analyze the co-occurrence
of purchasing events or acts of behavior by using customer transaction data. The
purpose of Association Analysis is proposing more suitable offers to customer thanks to
analyze of past behavior of them. In other words, thanks to Association Analysis, it is
predictable that the customer who buys the X product may also prefer to Y product.
Photo by Heidi Fin on Unsplash
Spotify songs recommendation to users according to their listening histories can be shown
as a real-life example of Association Analysis
Thanks to Bandits for Recommendation as Treatments (BaRT)[1], Spotify can determine which
songs each user likes personally and suggest new songs to them according to their likes. When
Spotify endeavors to determine users’ preferences, analyze historical interactions such as;
listened music, skipped music, songs that are added to playlists and, radio activities. Then detects
users with similar musical tastes and recommends the song on a personal playlist to another. It is
possible to decline that to identify relations between user’s preferences, genres, and singers,
Association Analysis is used by Spotify. Because the definition of the Association Analysis such as;
offers personal suggestions, estimates proposals to be liked, and so on are compatible with the
business algorithm used by Spotify.
In a field where competition is tough, Spotify needs to differentiate from its rivals. Because it was
competing with Apple Music, Pandora and, so on.[2] All of its competitors had similar features
and all competitors have an identical, immensely large catalog of music. Therefore, Spotify
realized the importance of a unique value proposition to gain an advantage over its competitors.
In this way, Spotify created the ‘’ discover weekly’’ feature that makes a difference from its
competitors by using Association Analysis and eliminates the threat of potential customers to
choose competitors of Spotify. Because Spotify has an impressive understanding of the
importance of customization for the users. In this way, the company builds loyalty between the
listeners and Spotify. Moreover, Scott Wolf remarked that Spotify’s data reveals a more realistic
personality analysis than a traditional physiological test. While people choosing their songs, they
do not behave like someone else. It provides more accurate results about individuals'
characteristics to understand their preferences and attitudes. [3]
The critical challenge appears in this point; security of personal data. One of the revenue sources
of Spotify and other similar companies is selling user data to other companies for customized
advertisements. Adam Bly states that thanks to Spotify’s data, companies can apply the right
strategy. They can present the right message to the right people at the right time. Even the Spotify
get permission from users to use their data during the registration phase and clarify there is
nothing illegal thing, it is obvious that reliability will continue as a problem for people who use
Spotify. Because, beyond personalized advertisements of products, data used for political goals by
Pandora[4], a competitor of Spotify. Therefore, users are concerned about this issue, and the
situation creates a new challenge for Spotify.
Integrating Association Analysis, create numerous competitive advantages. Some of that can be
listed as increasing customer satisfaction and loyalty, supplying effective marketing,
strengthening customer relationship management, enhancing productivity, and reducing risks
and uncertainties.[5]
Clustering Analysis is used to analyze data that are similar (in one sense) compared to others.
It tries to create distinct clusters correctly based on the given information. The goal of the
Clustering Analysis is to forecast future events based on data set and segmenting customers. In
this way, corporations can make better decisions.
Migros analyze customer purchasing acts such as which customer buy which product and from
which stores. In this way, Migros classifies its customers according to demographic features and
endeavor to identify which lifestyle is close to customers. According to the customer’s purchasing
behavior, Migros offers special products personally to affect people, sell more products, and
increase its revenue. Consequently, it can be matched with the Clustering Analysis, as mentioned
above, the definition of the clustering is well-matched with the Migros’s data analysis.
Before the Clustering Analysis, it was not possible to offer personalized products to customers
because Migros and other retailers could not identify the customer’s purchasing habits. Migros
faced the risk of boring its customers due to over 400 campaigns in a year because it could not
classify its customers who would be interested.[6] Moreover, mass marketing campaigns
cannibalize to one another and it causes to reducing revenue.
Migros executives state that after distinct segments are determined, they started to assign their
strategies. They mainly focused on segments which are consisted of people who consume healthy
foods, gourmets, eat junk-foods, and families with children. Migros contacts those segments
directly and offers special prices for products that are purchased by them frequently. Also, people
are informed about products that would be attractive. For example, people with healthy food
consumption habits get informed about the benefits of lycopene and in which products they can
find. Thanks to campaigns reaching the right customer target, profits of the campaign is
increased effectively. As proof of that, Alexandra Brunner declared that the conversation rate
rises if relevant offers access to relevant customers directly.
Unfortunately, using Clustering Analysis also created some challenges such as; the control of
operational processes. Also, the data security of customers is an issue for the Clustering Analysis.
Because, significant personal information is collected through Money Club Card, and some
potential customer who cares about the privacy of personal life would not prefer Migros to protect
personal information.
Adoption of Clustering Analysis makes it easy to target customer segments, advance customer
satisfaction level, increase efficiency, and effectiveness. Responding in time to customers
becomes available for the companies. Thanks to special offers, the lock-in effect has happened.
Furthermore, one of the other competitive advantages is Network Effect.
Encountering products that meet the needs of the customer at the right time, ensures customer
satisfaction level. In addition, if people do not purchase from Migros, the switching cost is
occurred due to special discounts. Therefore, loyalty is provided between the customer and the
company. Last but not least, Migros increases the number of customers and people influence
others that creates Network Effect.
Text Mining:
Text mining allows examining unstructured data, in this way meaningful information can be
obtained. Text mining extracts the main elements by discovering the relationship between
words. Sentiment analysis is one of the techniques to understand customers’ perspectives.
Photo by Norbert Buduczki on Unsplash
Thanks to Sprout Social, customer feedbacks on Twitter about Casio are gathered and Casio can
respond to its customers quickly rather than the past. Sprout Social analyzes people’s messages in
terms of tones as neutral, negative, or positive. In this way, Casio can understand customers’
perspectives of business effectively. Also, it can analyze whether customers like their products or
not. According to feedbacks, Casio makes regulations in the processes. According to the definition
above, it is obvious that Sprout Social can be explained as a real-life example of Sentiment
Analysis.
Casio wanted to compete with their rivals aggressively. Also, it needs to provide more efficient
social customer care. Casio needs to build a bridge between its marketing and customer care
departments, but also it must be beneficial for the large customer volumes.
Using Sprout Social make it possible to accessing customer care request for executives directly.
The obtained information is shared between marketing and customer care departments in a
single platform.[7] Additionally, Sprout Social not only analyses the tone of customers’ comments
but also observes how those comments have been continuing. Which means Casio can follow its
brand insight whether the value of the brand is increasing or not.[8] Integrating Sprout Social
increased response rates of Casio, helped to recognize customers’ current requests and demands,
and understand trends. One of the other crucial effects of Sentiment Analysis is determining the
right marketing strategy such as using distinct words in their social marketing campaigns.
Even there are no any reported challenges for Casio created by the sentiment analysis, constantly
changing demands and trends can cause difficulties for the brand. Because preparing marketing
campaigns takes time. Therefore, they must also be fast in preparing processes to catch
customer’s necessities and requests.
Sentiment Analysis allows companies to identify what is their value when comparing to rivals or
why their customers prefer other brands at which points. Answers of these questions gain the
chance for differentiation. Also gaining insight about customers increases the opportunity of
customer satisfaction.
Web Usage Mining:
Web mining searches information from web sites. The core purpose of web mining is analyzing
customer behavior for companies and assessing the effectiveness of websites to improve. It can
be separated into 3 types as web content mining, web structure mining, and web usage mining.
Under the title of web usage mining, it can be described as a technique that analyzes usage
patterns from the webserver. In this way, the needs of the customer can be appeared and served
better for the requirements of web-based applications.[9]
Thanks to Tableau, Watson gained the capability to filter and drill down in data. It also helped the
idea generation process of Watson. Instead of time-consuming email methods, customer insights
obtained instantly thanks to Tableau.[10] The aim is to gain deep insight into customer behavior
and increasing the effectiveness of the website. Investigating user interaction for Watson to
achieve its goals proves that this is a real-life example of Web Usage Mining.
ASW states that the main problem was the time-consuming processes because, before the
Tableau, all works are made manually. Also, they did not see the performance of sub-categories in
the traditional reports so achieving perspective was limited from the reports. They had to create
separate reports if they want to see but this was also time-consuming.
Tableau visualized all data and supply the sole user interface. In this way, Watson interprets their
data well and can make better timely decisions. Moreover, a collaboration between departments
is reinforced. That means gaining a competitive advantage for the Watsons and all problems,
mentioned above solved.
Again, even there are not any reported problems by Watson, the main challenge of web usage
mining can be explained as ethical issues. In terms of customer, privacy creates suspicion and
causes criticism especially when users are not informed.
The primary competitive advantages gained are speed advantage, supplying better performance
to customers
Tableau integrates external and internal data sources; therefore, insights of customers are more
accessible for Watson. Due to the easiness of using Tableau, the employee can analyze and
interpret data without the need for data analysts, which saves time. Thanks to visualization,
Watson’s employees can recognize and emphasize some parts and can focus on these areas which
require immediate action to catch the market trends and to stay ahead of the competitors.
[1] Gershgorn, D. (2019, October 4). How Spotify’s Algorithm Knows Exactly What You Want to
Listen To. Retrieved from https://onezero.medium.com/how-spotifys-algorithm-knows-exactly-
what-you-want-to-listen-to-4b6991462c5c
[2] https://qz.com/571007/the-magic-that-makes-spotifys-discover-weekly-playlists-so-damn-
good/
[3] https://www.spotifyforbrands.com/tr/insights/inside-spotifys-data-mission/
[4] Jacob, S. How Pandora Uses Data to Improve Its Service and Music Stations. Retrieved
from https://neilpatel.com/blog/how-pandora-uses-data/
[5] Bal, Y. Bal, M. Demirhan, Y. (2011, January) Creating Competitive Advantage by Using Data
Mining Technique as an Innovative Method for Decision Making Process in Business. Retrieved
from https://www.researchgate.net/publication/220449370_Creating_Competitive_Advantage
_by_Using_Data_Mining_Technique_as_an_Innovative_Method_for_Decision_Making_Proc
ess_in_Business
[6] https://www.sas.com/en_us/customers/migros-ch.html
[7] https://partners.twitter.com/en/success-stories/sprout-social-helps-casio-respond-to-
customers-faster
[8] https://www.commsights.com/sentiment-analysis-in-social-media/
[9] https://en.wikipedia.org/wiki/Web_mining#Web_usage_mining
[10] https://www.tableau.com/solutions/customer/ASWatson-builds-competitive-edge-with-
data-in-fast-moving-retail-industry
This is a continuation of the case study example of marketing analytics we have been discussing for the
last few articles. You can find the previous parts at the following links ( Part 1, Part 2, and Part 3). In the
last part, we discussed exploratory data analysis (EDA: Part 3). In this article we will talk about
association analysis, a helpful technique to mine interesting patterns in customers’ transaction data.
Association analysis can be used as a handy tool for extended exploratory data analysis. By the way,
association analysis is also the core of market basket analysis or sequence analysis. Later in the article,
we will use association analysis in our case study example to design effective offer catalogs for
campaigns and also online store design (website).
Scissorhands
I must have been 9 or 10 years old when in our school we had our first craft lecture. Craft lectures are
called SUPW in India, it’s an abbreviation for ‘Socially Useful Productive Work’. As a part of the first
lecture, each student was provided with an A4 sized color paper and a pair of scissors. In the first lecture
excited kids with no direction discovered that they could cut a sheet in a virtually infinite number of ways.
It was neither socially useful nor productive work and created a lot of wasted paper. A more apt long
form of SUPW in this case is ‘Some Useful Paper Wasted’. Later with a more directed effort we
discovered that there are so many cool shapes hidden in a piece of paper as long as scissors are used
wisely.
This is precisely the kind of experience many analysts have when they come across customers’
transaction data in companies. There is wealth of information about customer behavior hidden in this
data but it is hard to figure out where to start. Transaction data can be sliced, diced and grouped in
infinitely many ways similar to a piece of paper dissected with scissors. The key in both these above
cases is direction.
The point I am trying to drive at here is that data analysis is a highly planned activity. As an analyst
never touch your data before you have a proper plan of action (hypotheses etc.) in place. Having said
this there are always going to be times as an analyst when you have to enter uncharted territories of
data to find patterns. In these cases, I will recommend you rely on machine learning algorithms or create
your own modified algorithms specific to your requirements. In my opinion, machines are any day better
than us humans at this task. Association analysis powered by the Apriori algorithm is one
such technique to mine transaction data. Let’s explore association analysis in the next part.
Association Analysis
Association analysis, as you will discover soon, is primarily frequency analysis performed on a large
dataset. Since datasets for most practical problems are large you need clever algorithms like Apriori to
manage association analysis. Let’s consider a much smaller transaction dataset to learn about
association analysis. Here, each row or transaction number represents market baskets of customers.
For the subsequent products columns, 1 represents ‘bought the product in that transaction’, whereas 0
stands for ‘did not buy’.
001 1 1 1
002 0 1 0
003 1 0 1
004 1 0 1
005 1 1 0
There are a few association analysis metrics (i.e. support, confidence, and lift) that are really helpful in
deciphering information hidden in this kind of dataset. Let us explore these metrics and understand their
usage. Support for purchase of shirts and ties together in association analysis is defined as:
For our data there are 3 transactions with both shirts and ties (shirts∩ties) out of total 5 transactions.
60% is a fairly high value for support and you will rarely find such high values for support in real world
examples. For real world problems with several product groups, support of 1% or at times even lower
depending upon the nature of your problem is also useful.
In our dataset, there are 3 transaction for both shirts and ties together out of 4 transactions for shirts.
The calculation for confidence for our dataset is:
Again you will rarely find such high value of confidence for most real world problems unless there are
appealing combo offers on two products. A good value of confidence is again problem specific.
Expected confidence in the above formula is presence of ties in the overall dataset i.e. there are 4
instances of ties purchase out of 5.
The value for lift, 125%, shows that purchases of the ties improve when the customers buy shirts. The
question you are asking here is that if the customer buys a shirt, does his chance of buying ties go up
i.e. value of lift above 100%. Let us use our knowledge about association analysis for the case study
example we have been working on.
DresSMart Inc., where you are the Chief Analytics Officer & Business Strategy Head, is an online retail
store for clothes and apparel. They showcase different products, brands, and styles. You know
association analysis works best when performed separately on different customer segments (read
about customer segmentation). However, you have decided to do a quick association analysis on the
data available in your company.
With your data for formal shirts and ties we explored in the above example, you got support of 0.2% with
confidence of 12% and lift of 509%. This implies that though there are fewer percentage records of
transactions with both ties and shirts, once the customers buys formal shirts his chances of buying a tie
goes up five fold.
DresSMart provides the option to it’s customers to return the undamaged product back within 30 days
with full refund. You did a further investigation of customers who are buying ties along with shirts and
found that product return rates of the ties for these transactions are also 3 times more than the other
return rates. This is an indicator that customers are struggling to choose matching ties while placing the
orders online along with shirts. There is a need to improve this process on the company’s website. The
idea is to reduce product return rate while exploiting the full opportunity for cross selling ties with shirts.
You have found some good clues to improve the profitability of your company through exploratory data
analysis tools. Now you want to prepare and address the original objectives (Part 2) to improve
profitability for campaign efforts. You will delve into serious modeling for this task next time around.
Sign-off Note
Hope you enjoy being Edward Scissorhands with your data! See you soon with the next part of this case
study example where we will explore more about decision tree algorithms.