You are on page 1of 39

Data Mining Nostos

Data Mining
Data is ubiquitous. Instead of consuming data unnecessarily, we must use data to
optimize the information to turn the world into a realm.
The world is data-rich, but poor in information.
Data Mining is a magic wand that does wonders with data.

Data Mining
Try understanding it by breaking down the term into chunks.

 Data is present everywhere. They are the facts and statistics collected together for
reference and analysis.
 Mining is a vivid term portraying the process of gleaning small chunks of nuggets
from a sufficient volume of raw material.

Definition
In the book Data Mining Concepts and Techniques, Data Mining is defined as
The process of discovering interesting patterns (non-trivial, potentially useful,
previously unknown) and knowledge from large amounts of data.

Why Data Mining?


 The explosive growth of data from terabytes to petabytes
 Eminent sources of voluminous data
o Business: Web, Transactions, Stock
o Science: Satellites, Sensors, Bio-informatics, Medical Diagnosis
o Social Media: News, Youtube, Facebook, Twitter, Instagram

We are drowning in data, but starving for knowledge.


As the saying Necessity is the mother of Invention, it gave rise to Automated
analysis of massive data to extract knowledge. Thus, Data Mining came into
existence.
Data Mining can also be termed as Knowledge
Discovery in Databases (KDD). Data Mining -
An Interdisciplinary Term

Data mining is an interdisciplinary term that covers the domains portrayed in the
above image.

Why not Traditional Data Analysis?


 Tremendous volume of data generation
 High dimensionality of data
 High complexity of data
 Emergence of new and sophisticated applications
Data Mining in a Nutshell
Here is an expert talk on demystifying the concept of data mining and why it would
continue to grow in popularity.

the concept of data mining is growing in popularity in the realm of Commerce business
activities in general but it's kind of a misconceived or misunderstood topic and I want to give
you an idea of what data mining is all about basically we're in the information economy and
what you have is more and more data being generated in every aspect you can think of
every time you swipe your grocery card when you try to get a discount for buying whatever
products that's data being downloaded to a database on most transactions you do there is
some sort of data download organizations are storing processing and analyzing data more
than any time in history and that that trend is going to continue to grow so what is data
mining data mining is the incorporation of quantitative methods we'll call a mathematical
methods that may include mathematical equations algorithms some of your your prominent
methodologies are traditional logistic regression neural networks segmentation classification
clustering those are all methods that utilize mathematics data mining is applicable across
industry sectors generally wherever you have processes wherever you have data it is the
application of these powerful mathematical techniques in core incorporation with some
statistical type of inference testing they call it that will extract trends and patterns I teach a
course in NJIT data mining for managers and I go over this the first half of the course I give
students a very good understanding of what data mining is because to be honest many
people don't quite understand it so if it takes a full half of course to provide that
understanding what are these mathematical techniques but just as important the second half
of the course I say now you understand these techniques now let's use them in the business
world let's apply it to advertising and marketing effectiveness let's apply it to e-commerce
initiatives let's apply it to even healthcare processes supply chain processes there's a just a
number of businesses that are that can be mind with these techniques simply put any
organization where that has data and it has processes can be analyzed with data mining and
the results are extracting information actionable information from this these data resources
that organizations can fine-tune their the processes increase productivity and increase
efficiency so the data mining topic this whole idea the concept is going to grow in popularity
why because data continues to grow think about social networking now LinkedIn Twitter
Facebook what is it it's more data and it's data to describe people what they do what they
like who they are when you're out buying or doing whatever as far as using services just
conducting your daily daily lives more and more there's data gathering and data capturing
and it's the way it is we're in the information economy the way to extract strategic information
from that data the data resources is with data mining
KDD Process

Many people look upon data mining as a synonym of Knowledge Discovery in


Databases or KDD. Few others consider Data Mining as a vital step in the KDD
process.

The above image unveils the steps of KDD Process.

Steps in KDD Process


1. Data Cleaning - Removal of noisy and inconsistent data.
2. Data Integration - Data from diverse sources are unified.
3. Data Selection - Relevant data is retrieved.
4. Data Transformation - Data is transformed into appropriate forms.
5. Data Mining - Intelligent methods are applied to extract knowledge and patterns.
6. Pattern Evaluation - It identifies valuable patterns.
7. Knowledge Presentation- Visualization and presentation of the extracted knowledge
and the identified patterns.

Types of Mined Data


Thomas A. Edison said,
The value of an idea lies in the using of it.
The efficiency of the process relies on usage at the right place at the right time.

Basic forms of data that can be mined are listed as follows:

1. Database-oriented data
o Database data
o Data warehouse data
o Transactional data
2. Advanced datasets
o Spatial data
o Multimedia data
o Data streams
o Sensor data
o World Wide Web (WWW)

Patterns that can be Mined


Having known the types of data on which the process can be applied, let us now
examine the types of patterns that can be mined:

 Characterization and Discrimination


 Mining Frequent Patterns, Association, and correlations
 Classification
 Clustering
 Regression Analysis
 Outlier Detection
 Trend and Evolution Analysis

Data Mining in Daily Life


Amazon Recommendations
Have you ever used Amazon.com? If so, you would have come across the pop-up
with a phrase.
People who bought 'this' also bought.
It is the magic of Data Mining. It runs algorithms in the back-end to give us valuable
recommendations.
Facebook Suggestions
Facebook always surprises us with the suggestions of friends whom we know. It is
all part of the Data Mining Process.
Any common characteristic between the users, such as the school they studied in, is
processed by algorithms, which in turn suggest friends.

Data Mining in Daily Life


Market Basket Analysis
Market Basket Analysis helps in identifying the purchase patterns of potential
customers and analyzing the data gathered from them. The gathered data is then
used to predict future buying trends, and forecast supply demands.

Data Mining Techniques


Just as how the charm of a rainbow lies in its seven colors, the charm of data mining
lies in its seven techniques.

Data Mining Techniques


Data mining encompass the following techniques:

1. Regression
2. Classification
3. Clustering
4. Association Rules
5. Anomaly Detection
6. Data Visualization
7. Statistics

Let's start this colorful journey by appreciating each technique of data mining!

Regression
Our curiosity to find associations between things dates back to the nineteenth
century.

The term Regression was coined to illustrate a biological phenomenon that resulted
in a fascinating insight. The observation was that the heights of offspring of tall
ancestors tend to regress down towards a normal average.

Let us now delve into the first technique of Data Mining!


Regression Analysis
Regression Analysis is defined as
The process of examining the functional relationships among the variables.

 It is a predictive modeling technique.


 The functional relationship is represented in the form of an equation
containing:
o Response or dependent variable
o Explanatory or predictor variable
 Regression Analysis is used in prediction and forecasting applications.

Regression Equation
Let us assume:

 Response variable is represented by YY.


 Predictor variables are denoted by X_1,X_2,X_3...X_pX1,X2,X3...Xp,
where pp denotes the number of independent or predictor variables.

A simple regression model equation is represented as


Y=\beta_0+\beta_1X_1+....+\beta_pX_p+\varepsilonY=β0+β1X1+....+βpXp+ε
where
\beta_0,\beta_1,...\beta_pβ0,β1,...βp are regression parameters to be estimated
from data.
\varepsilonε is the measure of a discrepancy.
The sign of Regression Coefficient signifies the
direction of the relationship (positive or negative)
between a predictor and response variable.
Let's see a sample relationship between the linearly related variables.
Backdrop of Classification
Our intelligence of classification goes back to the era of ancient civilizations where
the perception Taxonomy emerged.

The word Taxonomy has its origin from an Ancient Greek word

 taxis- arrangement
 nomia- method

Taxonomy is the science of naming and classifying groups of biological species


based on shared characteristics. The taxonomic categories are domain, kingdom,
phylum, class, order, family, genus, and species.

Let us check the next technique of data mining.

Classification
Definition
Classification is the problem of identifying a category to which a new observation
belongs to, based on a training set of data containing observations whose categories
are already known.

 It is a data analysis task.


 It follows a two-step process, namely:
o Learning Step - Training phase where a model is constructed.
o Classification Step - Predicting the class labels and testing the same for
accuracy.
 Predicts the value of Categorical Variable.

Notable Algorithms
Eminent classification algorithms encompasses the following.
Base Classifiers

1. Neural Networks
2. Deep Learning
3. Decision Tree based Methods
4. Naive Bayes and Bayesian Belief Networks
5. Support Vector Machines
6. Rule-based Methods
7. Nearest-neighbor

Ensemble Classifiers

1. Boosting
2. Bagging
3. Random Forest
Data mining clustering

Clusters are everywhere! From atoms to the galaxy.


Atoms cluster together to form compounds.
Galaxies love to cluster together!
Marketing companies love to cluster customers!
It is time to admire the beauty of the third color of the Data Mining rainbow!

Clustering

Clustering

Clustering is the task of grouping a set of objects, such that objects in the same
cluster are more similar to each other than to the objects in other clusters.

 Distance measure plays a significant role in clustering.


 Unsupervised learning method.
 Common distance measures in

Numeric Dataset

- Manhattan distance
- Minkowski distance
- Hamming distance

Non-Numeric Datasets

- Jaccard index
- Cosine Similarity
- Dice Coefficient

An Ozone Example

Oxygen atoms cluster together to form Ozone Molecule.

A Peek into Applications


Recommendation Engines
Recommendation Engines cluster the potential customers on the basis of their
behavior (past search patterns), which is followed by filtering the data and
recommending relevant results to the users.
Example: Bing, Google

Anamoly Detection
Anamoly Detection is the identification of observations that do not conform to the
expected pattern of clusters or groups. It is also known as Outlier Detection.
Example: Finding fraudulent credit card transactions

A Peek into Applications


Marketing

 Marketers face a plethora of analysis techniques to appeal their new customers and
existing potential customers.
 Customers are clustered into segments based on their demographical information,
age, purchase behavior and so on.
 Based on the clustered segments, marketers tailor their efforts in marketing to
various customer subsets in terms of advertising strategies.

Delve
hello i'm luis serrano and this video is about flustering we're gonna learn two very
important algorithms k-means clustering and hierarchical clustering clustering is a type
of unsupervised learning and it basically consists of grouping data so if your data looks
like it's all over the place the algorithm will say okay you got a group here you've got
a group here a group here etc so let's take a look so let's start with an application the
application is gonna be in marketing in particular and customer segmentation and the
situation is the following we have an app and we want to market this app we've looked
at our budget and we can actually make three marketing strategies so that's our goal
to make three marketing strategies so the idea is to look at their potential customer
base and to split it into three well-defined groups when we look at the customer base
we realize that we have two types of information we have their age in years and we
have their engagement with a certain page in number of days per week so one of the
columns is demographic age and the other one is behavioral which is engagement
with the page and the engagement on the page can be a number from 0 to 7 since it's
in days per week so we look at the potential customer base and this is it there's 8
people with their age and their engagement so by looking at this list of people what
groups can you come up with let's take a look feel free to pause the video and think
about it for a minute so just by eyeballing I can think that for example this two people
are similar they have similar ages and similar engagements maybe I could put those
in the same group I don't know maybe these two are similar as well we can take awhile
and we can actually write them down and maybe come up with groups but there's
gotta be something easier or at least something mechanical the computer can do
automatically so one of the first things to do with data is to plot it so let's let's plot it in
some way let's plot it like this so in the horizontal axis we put the age and in the vertical
axis we put the engagement and now it looks more clear right there are three groups
here is one here is another one and here is the other one so that's our three marketing
strategies the first one is for people around the age of 20 who are have a low
engagement with the page two three and four days a week then strategy two for people
that are around their late 30s and early 40s and high engagement with the page and
then the last one is for people that are around their 50s and very low engagement with
the page and that is pretty much for clustering is basically if our data looks like it's a
bunch of points like this then a clustering algorithm will say hey you know what I don't
know much about your data but I can tell you that it's kind of split into these groups so
what we learn in this video is how to do these clustering how does the computer
identify these groups because for a human in this small case it's easy but for
computers not and in particular if you have many many many points and and many
columns or many dimensions it's not easy so in this video I'm gonna show you two
important methods the first one is called k-means clustering and the second one is
called hierarchical clustering so let's start with k-means clustering and the question is
how does the computer look at this points all over the place and figure out that they
are forming these groups so when I try to imagine points in the plane I just imagine
places in a city and trying to put pizza parlors so let's say that we are the owners of
this pizza place and we want to put three pizza parlors in this city and what we want
to do is we want to put them in such a way that we serve our clientele in the best
possible way so we look at our clientele and it looks like this this is where they live so
what we want to do is locate three pizza parlors in the best possible places that will
serve a clientele so if you take a look at it you can come up with three places right it
seems like we should have a red one that serves the red point some blue one that
serves the blue points and a yellow one that serves the yellow points however for
humans is easy but a computer has a harder time so what the computer is gonna do
is like in most things in machine learning start a random spot and start getting better
and better so how to start well first it locates three random points and puts three pizza
parlors there and now what we're gonna do is a series of slightly obvious logical
statements that when put together will get us to a better place so the first logical
statement is it seems like if we have the pizza parlors in these places everyone should
go to the closest one to them that makes sense right so we're gonna plot all the people
that go to the red to the blue and to the yellow pizza parlor basically you go to the one
that is the closest so here's another logical statement if all the red people go to the red
pizza parlor it wouldn't make sense to put it in the center of all those houses right and
the same thing with the blue and with the yellow basically you move the pizza parlor
to the center of the houses that it's serving so the yellow one will serve these houses
over here the blue one is serving these houses over here and the red one is serving
these houses over here so we move each one of them to the center of the houses that
they're serving and now let's apply the first logical statement again we have three pizza
parlors and everyone's gonna go to the one that is closest to them so some things
change right because let's take a look at these three blue points well now they're closer
to the yellow pizza parlor so these people move and now they're gonna go to the other
two the yellow pizza parlor what about these two red points over here well now they're
closer to the blue pizza parlor so they're gonna start going to the blue pizza parlor now
so let's go back to the ideological statement which is that the best location for a pizza
parlor is the center of the houses that it serves so we move every pizza parlor to the
center of the houses that it serves and again let's go back to the first logical statement
which is every person goes to the closest pizza parlor so if you look at these points
over here they are red but now they're much closer to the blue pizza parlor so they
move to the blue pizza parlor now and you can see that we're getting better and better
right because now when we apply the other statement which is every pizza parlor
should be at the center of the houses that it serves then now we move everything to
the center or the houses where it serves and we're done so that's pretty simple right
and a computer can do it because a computer can find the center of a bunch of points
by just averaging the coordinates and can also determine if a point is closer to one
center than to the other one because it simply just applies the Pythagorean theorem
or the distance formula and can compares numbers these are these are decisions that
a computer can make very easily so we managed to think like a computer and not like
a human which is basically the main idea and machine learning so this is a k-means
clustering algorithm now you may be noticing that we took one decision that seemed
to be taken by a human and not by a computer right we decided that there were three
clusters but as we said that's hard for a computer to decide as a human can see it
back empiric and so here's a question how do we know how many clusters to pick and
for this we have a few methods but I'm gonna show you what's called the elbow
method so the elbow method basically says try a bunch of numbers and then be a little
smart on how to pick the best one so let's try with one cluster we can do this algorithm
with only one cluster and we're probably gonna get something like this every house
go to the same pizza parlor let me can run it with two clusters and you can start seeing
that this algorithm actually depends on where the original point starts sometimes it
works sometimes it doesn't sometimes give you different answers so let's say we try
to clusters and we got this then we try three clusters and we got the solution that we
got then we try with four clusters and let's say we got this with five clusters and we got
this and with six clusters and we got this so by eyeballing this we can see that the best
solution is with three clusters but again we need to teach the computer how to find the
three clusters we need to think like a computer so we can't rationalize things we have
to do things like measuring distances comparing numbers averaging coordinates etc
so with those tools how do we find that 3 is the best well what we need is a measure
of how good is one clustering and maybe the following measure will make sense
basically we're gonna do is we're gonna think of the diameter of a clustering and the
diameter is simply gonna be the largest possible distance between two points of the
same color that basically tells us how big each group is in a rough way so let's look at
the first one cluster solution the longest possible distance between two points of the
same color is this one those two red points are the farthest apart so that distances
sign is a win away telling us how good is that clustering let's do it with two clusters so
the longest distance let's say it's this distance over here that tells us how good the
clustering is with two clusters now let's do it with three clusters let's say that the longest
distance is this one over here again with four clusters along its distance is this one
with five clusters long a distance is this one and with six clusters is this one now I just
eyeball these distances so if you think there's another one you may be correct but
conceptually what we're trying to do is to do it define the next method which is the
elbow method so what we're gonna do is we take all these distances and we graph
them in the following way on the horizontal axis we're gonna put the number of clusters
so one two three four five and six and on the vertical axis we're gonna graph the
diameter so we get the following points and now what we do is we join these points
and now this is somewhere where a human can intervene a human can look at this
graph and say okay you know what I want the elbow to be here there are also some
automatic methods to do this but at some point in in the machine learning algorithm is
good to actually have a consistency check because you may have an idea of how
many clusters you want or you may have an idea of how many Coster you you would
like to have or a maximum or a minimum so anyway in some way or another we figure
out that the numbers to be another thing that's important is that this elbo math is very
easy for a human if our if our data has many many columns we're looking at points in
very high dimensions however the elbow method the graph is always gonna be two-
dimensional so that's it that's how we decided that three clusters are the best and that
is the k-means clustering algorithm in a nutshell okay so now let's go to our second
method which is hierarchical clustering and we're gonna do a similar problem except
now with this data set we're gonna find it a clustering and two let's see how many
groups we can find so another way to do it is the following let's think about this let's
think of the two closest points it wouldn't make sense to say that these two points that
are the closest would belong to the same group maybe yes maybe no but it's a sensical
thing to ask right so let's go on that statement let's say these are the two closest points
so these two are gonna be part of the same group now what are the next two closest
points let's say it is two so these two belong in a group and we're gonna keep going in
this direction the two closest points are these ones so these two belong to the same
group the two closest points are this ones so now what do we do well we just join the
two groups so now it becomes a group of three the two closest points are these two
so they join like this the two closest points after that are these two so we join the two
groups the group of two and the group of three into a group of five and then the next
pair of points are the closest are these

 so we're gonna join them but let's say that's just too big so we have maybe a
measure of how much is too far so we stop here and that's it that's hierarchical
clustering it's pretty simple right now again there seem to be a human decision
here right why did we decide on that being the distance or for example why did
we decide on two being the number of clusters so we can make this decision
but let's actually look at an educated way to make this decision so let's answer
this question how do we decide the distance or the number of clusters so a way
to do it is by building something called add and drop so what we're gonna do is
the following we're gonna pour points in a row over here one up to eight and
then in the vertical axis we're gonna graph the distance I'm going to show you
how let's pick the closest two points which are four and five so we join four and
five and we join them over here and this is not up to scale but the height of that
little curved line between four and five let's say is the distance between four and
five so we join this two and then we go to the next two which is 1/2 and so we're
going to join one two here and we're gonna join them in the dendrogram they're
right and again assume that that height of that little curved line is the distance
between one and two now we join the next pair which is six and seven so we
join six and seven and again the height is the distance we keep going six and
eight so now we're gonna join six and eight how do we join them well we join
them like this the group of six seven and the group of eight and the next group
is three and four five so they get joined like this and now the next group is gonna
be two and three so we join the group corresponding to two one the group's
one two three in the dendrogram and notice that the dendogram goes up
because these distances increase so every time we make a new joint it's higher
than the previous one the next one that we joined are three and six so we end
up joining these two trees like that and so that's it we have a lot of information
about this set in this dendrogram and now how do we decide where to cut well
let's say we cut over here at a certain distance and that gives us two clusters
which are this one one two three four and five and this one which is six seven
and eight so notice that we made the decision on cutting based on how much
a distance is too far away or how many clusters do we want to obtain let's say
the one obtained four clusters so we cut out this distance over here which gives
us four clusters the cluster formed by one and two the one formed by three by
itself the more informed by four and five and the one formed by six seven and
eight so again these decisions are taken by a human but think about it again
let's say we have billions of points and let's say that they live in a thousand
dimensional space it doesn't matter the dendogram is still a two dimensional
figure and we can easily make decisions on it so again a combination of a
computer algorithm and some smart human decisions it what gives us the best
clustering and that's it that's hierarchical clustering in a nutshell clustering has
some very interesting applications and admissions some of them things like
genetics or evolutionary biology the genome carries a lot of information about
a species and if you manage to cluster them you get to understand a lot about
species and how they evolved into what they are right now other things I
recommend are systems use a lot of clustering for example the way you may
have got this video recommended was using several methods that include
clustering users grouping them into into similar users so maybe somebody very
similar then you watch this video and that's why you gotta recommend it and
that brings us to social networks which is another place where a clustering is
used a lot in a very similar example than the one we did social networks use
these methods to group users into certain similar groups based on
demographics based on behavior and then be able to target information to them
that they want to see or suggest your friends that are similar to you at cetera so
that's all for now thank you very much for your attention as usual if you would
like to see some more of this content please subscribe or hit like feel free share
with your friends and feel free to throw in a comment and tell me what you think
of the video or if you have any suggestions for other videos you'd like to see
and my twitter handle is Luis likes math if you'd like to tweet at me so thanks
again and see you in the next video
Mining association rule:
Probably, we all felt at a point of time that IF statements are the easiest in
Programming!
Imagine those IF statements doing magic in increasing the sales of markets. Yes!
Association Rule Mining does that.

Let us stop beating around the bush with programming paradigms. It is time to check
the next technique of data mining.

Association Rule Mining

The next technique of data mining is Association Rule Mining.

Association Rule Mining


Association Rule Mining aids in identifying the associations, correlations, and
frequent patterns in data.

The derived relationships are represented in the form of Association Rules.

Types and Formats


Rule Format
IF{Set of Items} \Rightarrow⇒ THEN{Set of Items}
IF part is termed as Antecedent, while the THEN part is termed as Consequent.
Antecedent and Consequent are disjoint sets.

Types of Association Rules


 Multilevel Association Rule
 Multidimensional Association Rule
 Quantitative Association Rule
An Example

The above image portrays the process of Market Basket Analysis.

Association Measures
The important measures that aid in forming Association Rules are as follows.
Support
It indicates how frequently the item appears in the dataset.
Confidence
It measures the number of times the extracted IF-THEN rule is found to be valid.
Lift
It compares the confidence with expected confidence.
Walmart customers who purchase Barbie dolls have
a 60% likelihood of buying one of three types of
candy bars.

Notable Algorithms
Some key algorithms that generate Association Rules are:

 AIS
 SETM
 Apriori

ARM in Action with Apriori


Check out the video to see how Apriori works.
hello everyone and welcome to this interesting session on a prairie and quartum now many

of us have visited reading shops such as Walmart or Target for our household needs well

let's say that we are planning to buy a new iPhone from Target what we would typically do is

search for the model by visiting the mobile section of the store and then select the product

and head towards the billing counter but in today's world the goal of the organization is to

increase the revenue can this be done by just pitching one product at a time to the customer

now the answer to us is clearly no hence organization began mining data relating to

frequently bought items so market basket analysis is one of the key techniques used by

large retailers to uncover associations between items now examples could be the customers

who purchase bread have a 60% likelihood to also purchase Jam customers who purchase

laptops are more likely to purchase laptop bags as well they try to find out associations

between different items and products that can be sold together which gives assisting in the

right product placement typically it figures out what products are being bought together and

organizations can place products in a similar manner for example people who buy bread

also tend to buy butter right and the marketing team at reiden stores should target customers

who buy bread and butter I provide and offer to them so that they buy a third item suppose

eggs so if a customer buys bread and butter and sees a discount offer on eggs he will be

encouraged to spend more and buy the eggs and this is what market basket analysis is all

about this is what we are going to talk about in this session which is Association rule mining

and the a priori El Corte Association rule can be thought of as an if-then relationship just to

elaborate on that we have come up with a rule suppose if an item a is being bought by the

customer then the chances of item being picked by the customer - under the same

transaction ID is found out you need to understand here that it's not a casualty rather it's a

co-occurrence pattern that comes to the force now there are two elements to this rule first is
if and second is that then now if is also known as antecedent this is an item or a group of

items that are typically found in the item set and the later one is called the consequent this

comes along as an item with an antecedent group or the group of antecedents approaches

now if we look at the image here a arrow B it means that if a person buys an item a then he

will also buy an item B or he will most probably buy an item B the simple example that I gave

you about the bread and butter and the X is just a small example but what if you have

thousands and thousands of items if you go to any professional data scientist with that data

you can just imagine how much of profit you can make if the data scientist provides you with

the right examples and the right placement of the items which you can do and you can get a

lot of insights that is what associate rule mining is a very good algorithm which helps the

business make profit so let's see how this algorithm works so Association rule mining is all

about building the rules and we have just seen one rule that if you buy a then there's a slight

possibility or there's a chance that you might buy B also this step of a relationship in which

we can find the relationship between these two items is known as single cardinality but what

if the customer who bought a and B also wants to buy C or if a customer who bought a B

and C also wants to buy D then in these cases the cardinality usually increases and we can

have a lot of combination around these data and if you have around 10,000 or more than

10,000 data items just imagine how many rules you're going to create for each product that

is my Association rule mining has such measures so that we do not end up creating tens of

thousands of rules no that is really the ebrary algorithm comes in but before we get into the

ebrary algorithm let's understand what's the mattes behind it now there are three types of

matrices which help to measure the Association we have support confidence and lift so

support is the frequency of item a or the combination of item a or B it's basically the

frequency of the items which we have bought and what are the combination of the frequency
of the item we have bought so with this what we can do is filter out the items which have

been bought less frequently this is one of the measures which is support now what

confidence tells us so confidence gives us how often the items a and B occur together given

the number of times a occur now this also helps us solve a lot of other problems because if

somebody is buying a and B together and not buying see we can just rule out C at that point

of time so this solves another problem is that we obviously do not need to analyze the

problems which people just buy barely so what we can do is according this is we can define

our minimum support and confidence and when you have set this values we can put this

values in the algorithm and we can filter out the data and we can create different rules and

suppose even after filtering you have like five thousand rules and for every item we create

these five thousand rules so that's practically impossible so for that we need the third

calculation which is the lift so lift is basically the strength of any rule now let's have a look at

the denominator of the formula given here and if you see here we have the independent

support values of a and B so this gives us the independent occurrence probability of a and B

and obviously there's a lot of difference between this random occurrence at Association and

if the denominator of the lift is more what it means is that the occurrence of randomness is

more rather than the occurrence because of any Association so left is the final verdict where

we know whether we have to spend time on this but rule what we have got here or not now

let's have a look at a simple example of Association rule mining so suppose we have a set of

items a B C D and E and a set of transactions t1 t2 t3 t4 and t5 and as you can see here we

have the transactions t1 in which we have ABC t2 a CD t3 BCD t4 a de and t5 BCE now

what we generally do is create some rules or Association rules such as a gives T or C gives

a a give C B and C gives a what this basically means is that if a person buys a then he's

most likely to buy D and if a person by C then he's most likely to buy a and if you have a look
at the last one if a person by his B and C he's most likely to buy the item here as well now if

we calculate the support confidence and left using these rules as you can see here in the

table we have the rule and the support converse and the lift values now let's discuss about a

priori so a priori algorithm uses the frequent itemsets to generate the Association rule and it

is based on the concept that a subset of a frequent item set must also be a frequent item set

itself now this raises the question what exactly is a frequent Adams set so our frequent

Adams set is an item set whose support value is greater than the threshold value now just

now we discussed that the marketing team are going to the sales have a minimum threshold

value for the confidence as well as the support so frequent Adams have is that item set who

support value is greater than the threshold value already specified an example if a and B is a

frequent item set then a and B should also be frequent itemsets individually now let's

consider the find transaction to make the things for them easier suppose we have

transactions 1 2 3 4 5 and these items are there so T 1 has 1 3 and 4 T 2 has 2 3 & 5 t 3 has

1 2 3 5 t 4 2 5 and T 5 1 3 & 5 now the fourth step is to build a list of item sets of size 1 by

using this transactional data and one thing to note here is that the minimum support count

which is given here is 2 let's suppose it's 2 so the first step is to create item sets of size 1

and calculate their support values so as you can see here we have the table c1 in which we

have the item sets 1 2 3 4 5 and the support values if you remember the formula of support it

was frequency divided by the total number of occurrence so as you can see here for the item

set 1 the support is 3 as you can see here that I don't set one appears in T 1 T 3 and T 5 so

as you can see it's frequency is 1 2 & 3 now as you can see here the items at 4 has a

support of 1 as it occurs only once in transaction 1 but the minimum support value is 2 that's

why it's going to be eliminated so we have that final table which is the table F 1 in which we

have the item sets 1 2 3 & 5 and we have the support values 3 3 4 & 4 now the next step is
to create item sets of size 2 and calculate their support values now all the combination of the

item sets in the F 1 which is the final table in which you discarded the 4 are going to be used

for this iteration so we get the table C 2 so as you can see here we have 1 2 1 3 1 5 2 3 2 5

& 3 5 now if you calculate this support here again we can see that the items at 1 comma 2

has a support of 1 which is again less than the specified threshold so we're going to discard

that so if we have a look at the table F 2 we have 1 comma 3 1 5 2 3 2 5 & 3 5 again we're

going to move forward and create the Adams set of size 3 and calculate the support values

now all the combinations are going to be used from the item set F 2 for this particular

iterations now before calculating support values let's perform pruning on the data set now

what is pruning now after the combinations are being made we devise C 3 item sets to check

if there is another subset whose support is less than the minimum support value that is what

frequent item set means so we have a look here the item sets we have is 1 2 3 1 2 1 3 2 3 4

the first one because as you can see here if we have a look at the subsets of 1 2 3 we have

1 comma 2 as well so we are going to discard this whole item set same goes for the second

one we have 1 2 5 we have 1 2 in that which was discarded in the previous set or the

previous step that's why we're gonna discard that also which leaves us with only 2 factors

which is 1 3 5 item set and the 2 3 5 and the support for this is 2 and 2 as well now if we

create the table see for using four elements we're gonna have only one item set which is 1 2

3 & 5 and if we have a look at the table here the transaction table 1 2 3 & 5 appears only 1

so the support is 1 and since see for the support of the whole table C 4 is less than 2 so

we're gonna stop here and return to the previous Adam set that is 3 3 so the frequent

itemsets have 1 3 5 & 2 3 5 now let's assume our minimum confidence value is 60% for that

we're gonna generate all the non-empty subsets for each frequent itemsets now for I equals

1 comma 3 comma 5 which is the item set we get the subset 1 3 1 5 3 5 1 3 & 5 similarly for
2 3 5 we get 2 3 2 5 3 5 2 3 & 5 now this rules taste that for every subset s of I the output of

the rule gives something like s gifts I to s that implies s recommends I of s and this is only

possible if the support of I divided by the support of s is greater than equal to the minimum

confidence value now applying these rules to the item set of F 3 we get Rule 1 which is 1 3

gives 1 comma 3 comma 5 and 1/3 it means 1 & 3 gives 5 so the confidence is equal to the

support of 1 comma 3 comma fired wherever the support of 1 comma 3 daddy pulse 2 by 3

which is 66% and which is greater than the 60 person so the rule 1 is selected now if we

come to rule 2 which is 1 comma 5 it gives 1 comma 3 comma 5 and 1/5 it means if we have

1 and 5 it implies we also gonna have 3 now if we calculate the confidence of this one we're

going to have support 1 3 5 whereby support 1 5 which gives us 100 person which means

rule 2 is selected as well but again if you have a look at rule 506 over here similarly if it

select 3 gives 1 3 5 & 3 it means if we have 3 we also get 1 & 5 so the confidence for this

comes at 50% which is less than the given 60% target so we're gonna reject this rule and

same goes for the rule number 6 now one thing to keep in mind here is that although the rule

1 and rule 5 look a lot similar they are not so it really depends what's on the left-hand side of

the arrow and what's on the right-hand side of the arrow it's the if-then possibility I'm sure

you guys can understand what exactly these rules are and how to proceed with this rules so

let's see how we can implement the same in Python right so for that what I'm going to do is

create a new Python file and I'm going to use the Jupiter notebook you are free to use any

sort of IDE I'm going to name it as a priority so the first thing what we're gonna do is we'll be

using the online transactional data or ETS store for generating Association rules so firstly

what we need to do is get the pandas and mlx tel libraries imported and read the file so as

you can see here we are using the online retail dot xlsx format file and from ml extend we

are going to import a priori and Association rules it all comes under ml extend so as you can
see here we have the invoice the stock code the description the quantity the invoice date a

unit price customer ID and the country now next in this step what we're going to do is do

data cleanup which includes removing the species from some of the descriptions and drop

the rules that do not have invoice numbers and remove the great grad transactions because

that is of no use to us so as you can see here at the output in which we have like five

hundred and thirty two thousand rows with eight columns so after the cleaned up we need to

consolidate the items into one transaction per row with each product for the sake of keeping

the data set small we are only looking at the sales for France so as you can see here we

have excluded all the other seeds we are just looking at the seeds for France now there are

a lot of zeros in the data but we also need to make sure any positive values are converted to

one and anything less than zero is set to zero so as you can see here we are still 392 rows

you're gonna encode it and see check again now that you have structured the data properly

in this step what we're going to do is generate frequent itemsets that have support at least

7% now this number is chosen so that you can get close enough and generate the rules with

the corresponding support confidence and left so as you can see here the minimum support

is 0.7 and then what if we add another constraint on the rules such as the lift is greater than

six and the conference is greater than 0.8 so as you can see here we have the left-hand side

on the right-hand side of the Association rule which is the ant ascendant and the

consequence we have the support we have the confidence to lift the leverage and the

conviction so is that's it for this session that is how you create Association rules using the

ebrary algorithm which helps a lot in the marketing business it runs on the principle of market

basket analysis which is exactly what big companies like Walmart you have reliance and

Target to given IKEA does it and I hope you got to know what exactly is Association rule

mining what is lift confidence and support and how to create Association rules that we have
any queries feel free to mention it in the comment section below till then thank you and

happy learning I hope you have enjoyed listening to this video please be kind enough to like

it and you can comment any of your doubts and queries and we will reply them at the earliest

do look out for more videos in our playlist and subscribe to any rekha channel to learn more

happy learning

Applications

Outerline detection
There is an evolving danger to the employees, customers, and organizations in the
form of intrusions, cyberattacks, and fraudulent transactions.
It is not so long since Facebook-Cambridge Analytica data scandal happened.

Outlier Detection technique can support in minimizing the attacks. Let us delve into
the next technique of data mining.

Outlier Detection
Outlier
Jiawei Han defines Outlier as
A data object that deviates significantly from the normal objects as if it were
generated by a different mechanism.

Types of Outlier
 Global Outlier
It significantly deviates from the entire dataset.
 Contextual Outlier
It significantly deviates based on the context selected.
 Collective Outlier
A subset of data objects collectively deviates from the entire dataset.

Diverse Methods
Outlier Detection Methods:

 Statistical approach
 Clustering-based approach
 Classification approach
 Proximity-based approach

Applications

Data Virtualization
According to the American mathematician John W. Turkey,
"The greatest value of a picture is when it forces us to notice what we never
expected to see."

Images speak louder than words, which presents data mining in the form of Data
Visualization.
Let us check the next technique of data mining.

Data Visualization

Data Visualization

It is a technique that aids in representing information in the form of visual context,


helping people understand the significance of data.

Patterns and trends that would be unveiled in text-based information can be easily
spotted with data visualization.
Why Data Visualization?

Read the following statement.

It was predicted that the price of pizza will increase by 25% next year, while the price
of burger will decrease by 20% next year.

Confusing right?

Now see the image.

The price charts state the fact clearly.

data visualization.
Adam Mc Cann created a data visualization that visualizes every single song recorded by
Bruce Springsteen. Using data from Spotify and other written sources, he was able to plot all
the aspects of songs.
Tools
The following tools help in visualizing your data:

 Excel
 Tableau
 QlikView
 Fusion Charts

Delve into Statistics


Statistics is the art of learning from data.
Quoting the famous mathematician Shakuntala Devi:
Numbers have life; they are not just symbols on paper.
Statistics is a powerful tool that works on numbers to provide us with inferences.
Let us dive into this technique to explore more!

Statistics
It is time to see the next technique of Data Mining!

Statistics can be defined as the science of collecting, analyzing, and


interpreting data.

Two broad categories of statistics that help in data analysis are:

 Descriptive Statistics
 Inferential Statistics

Descriptive Statistics
Descriptive Statistics

 It provides summary statistics of data.


 It helps in quantitatively interpreting the features of data.
 It is used in sample datasets (not population datasets).

Measures
 Measures of Central Tendency
Focus on the average or central point of a dataset.
o Mean
o Median
o Mode
 Measures of Spread
Focus on the dispersion of data from the central point.

o Range
o Standard Deviation
o Variance
o Interquartile Range

Inferential Statistics
Inferential Statistics

 It makes inferences about the properties of a population.


 It makes propositions about a population.

End of Rainbow
We just cherished all the colors of the Data Mining rainbow!
Hiding within those piles of data is the knowledge that could change the life of a
patient, or change the world.
Mine the knowledge with these techniques to adore the colorful insights. Data isn’t
the new oil; it’s the soil in which findings grow.
Continue this hued journey to know more colors of Data Mining!
Inferential Statistics is used in Population datasets.

Descriptive Statistics provides the summary statistics of the data.

You might also like