8 views

Uploaded by Siddharth Singhal

- Athipathy Data Min In
- 9
- Association Analysis
- Weka Tutorial
- 10.1.1.72.5582.pdf
- Data Mining Techniques
- MSc Business Analytics
- IRJET-Frequent Pattern Mining Over Data Stream Using Compact Sliding Window Tree & Sliding Window Model
- Friendbook: An Advanced Friend Recommendation System For Social Networking sites
- Big Data Sample Chapter
- Analysis of Sampling Techniques for Association Rule Mining
- 6825460
- Elixir
- A Study of Sales Prediction Analysis in a Business Organization using Data Mining Technique
- Alteryx Hadoop Whitepaper Final1
- JOB JOB
- IITH Executive MTech Brochure
- access_control.pdf
- Fullerton PPT
- Analytics Retail Banking Why How

You are on page 1of 20

Cyrus Lentin

Predictive Analytics Market Basket Analysis

Market Basket Analysis (Association Analysis) is a mathematical modeling technique based upon the

theory that if you buy a certain group of items, you are likely to buy another group of items

It is used to analyze the customer purchasing behavior and helps in increasing the sales and maintain

inventory by focusing on the point of sale transaction data

Given a dataset, the Apriori Algorithm trains and identifies product baskets and product association

rules

Predictive Analytics - Case Study - Target Predicts A Teens Pregnancy

http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-

pregnant-before-her-father-did/#1544af3a34c6

http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?_r=0

Predictive Analytics - Case Study - Target Predicts A Teens Pregnancy

Predictive Analytics - Case Study - Target Predicts A Teens Pregnancy

Apriori Algortihm

It Is Used For Mining Frequent Item-sets And Relevant Association Rules

It Is Devised To Operate On A Database Containing A Lot Of Transactions, For Instance, Items Brought

By Customers In A Store

It Is Very Important For Effective Market Basket Analysis And It Helps The Customers In Purchasing

Their Items With More Ease Which Increases The Sales Of The Markets

It Has Also Been Used In The Field Of Healthcare For The Detection Of Adverse Drug Reactions

It Produces Association Rules That Indicates What All Combinations Of Medications And Patient

Characteristics Lead To Adverse Drug Reactions

Association Rules

Association Rule Learning Is A Prominent And A Well-explored Method For Determining Relations

Among Variables In Large Databases.

Let I = { i_1, I_2, I_3, ..., I_n } Be A Set Of N Attributes Called Items

Let D = { t_1, T_2, ..., T_n } Be The Set Of Transactions. It Is Called Database.

Every Transaction, T_i In D Has A Unique Transaction ID, And It Consists Of A Subset Of Itemsets In I.

A Rule Can Be Defined As

An Implication, XY Where X And Y Are Subsets Of I (X, Y I)

And

They Have No Element In Common i.e. X Y = NULL

Heading

The example that we are considering is quite small and in practical situations, datasets contain millions

or billions of transactions.

The set of itemsets, I ={Onion, Burger, Potato, Milk, Beer}; a database consisting of six transactions.

Each transaction is a tuple of 0's and 1's where 0 represents the absence of an item and 1 presence.

An example for a rule in this scenario would be {Onion, Potato} => {Burger}, which means that if onion

and potato are bought, customers also buy a burger.

Support

The Support Of An Itemset X, Supp(x) Is The Proportion Of Transaction In The Database In Which The

Item X Appears. It Signifies The Popularity Of An Itemset.

If The Sales Of A Particular Product (Item) Above A Certain Proportion Have A Meaningful Effect On

Profits, That Proportion Can Be Considered As The Support Threshold.

Furthermore, We Can Identify Itemsets That Have Support Values Beyond This Threshold As

Significant Itemsets.

Confidence

This Implies That For 75% Of The Transactions Containing Onion And Potatoes, The Rule Is Correct

It Can Also Be Interpreted As The Conditional Probability P(y|x), i.e., The Probability Of Finding The

Itemset Y In Transactions Given The Transaction Already Contains X

Confidence Can Give Some Important Insights, But It Also Has A Major Drawback

It Only Takes Into Account The Popularity Of The Itemset X And Not The Popularity Of Y

If Y Is Equally Popular As X Then There Will Be A Higher Probability That A Transaction Containing X

Will Also Contain Y Thus Increasing The Confidence

To Overcome This Drawback There Is Another Measure Called Lift

Big Data Technology - Cyrus Lentin 9

Lift

This signifies the likelihood of the itemset Y being purchased when item X is purchased while taking

into account the popularity of Y.

In our example above,

If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with itemset X,

while a value less than 1 implies that itemset Y is unlikely to be bought if the itemset X is bought.

Conviction

The conviction value of 1.32 means that the rule {onion, potato}=>{burger} would be incorrect 32%

more often if the association between X and Y was an accidental chance.

How Does Apriori Algorithm Work?

Assumption:

All subsets of a frequent itemset must be frequent

Similarly, for any infrequent itemset, all its supersets must be infrequent too

Before beginning the process, let us set the support threshold to 50%, i.e. only those items are

significant for which support is more than 50%.

How Does Apriori Algorithm Work?

Step 1

Create A Frequency Table Of All The Items That

Occur In All The Transactions.

For Our Case, Refer Figure

Step 2:

Only Those Elements Are Significant For Which

The Support Is Greater Than Or Equal To The

Threshold Support.

Here, Support Threshold Is 50%, Hence Only

Those Items Are Significant Which Occur In

More Than Three Transactions And Such Items

Are Onion(O), Potato(P), Burger(B), And

Milk(M). Therefore, We Are Left With:

How Does Apriori Algorithm Work?

Step 3:

The Next Step Is To Make All The Possible Pairs

Of The Significant Items Keeping In Mind That

The Order Doesnt Matter, I.E., AB Is Same As

BA.

So, All The Pairs In Our Example Are Op, Ob, Om,

Pb, Pm, Bm.

Step 4:

We Will Now Count The Occurrences Of Each

Pair In All The Transactions.

The Result As Given

How Does Apriori Algorithm Work?

Step 5:

Again Only Those Itemsets Are Significant Which

Cross The Support Threshold, And Those Are OP,

OB, PB, And PM.

Step 6:

Now Lets Say We Would Like To Look For A Set

Of Three Items That Are Purchased Together.

We Will Use The Itemsets Found In Step 5 And

Create A Set Of 3 Items.

To Create A Set Of 3 Items Another Rule, Called

Self-join Is Required.

It Says That From The Item Pairs Op, Ob, Pb And

Pm We Look For Two Pairs With The Identical

First Letter And So We Get

OP And OB, This Gives OPB

PB And PM, This Gives PBM

Next, We Find The Frequency For These Two

Itemsets

Big Data Technology - Cyrus Lentin 15

How Does Apriori Algorithm Work?

The Example That We Considered Was A Fairly Simple One And Mining The Frequent Itemsets

Stopped At 3 Items

In Practice, There Are Dozens Of Items And This Process Could Continue To Many Items

Suppose We Got The Significant Sets With 3 Items As Opq, Opr, Oqr, Oqs And Pqr And Now We Want

To Generate The Set Of 4 Items

For This, We Will Look At The Sets Which Have First Two Alphabets Common, i.e.,

OPQ And OPR Gives OPQR

OQR And OQS Gives OQRS

In General, We Have To Look For Sets Which Only Differ In Their Last Letter/Item

General Process Of The Apriori Algorithm

Apply minimum support to find all the frequent sets with k items in a database

Use the self-join rule to find the frequent sets with k+1 items with the help of frequent k-itemsets

Repeat this process from k=1 to the point when we are unable to apply the self-join rule

Pros & Cons

It Is An Easy-to-implement And Easy-to-understand Algorithm.

It Can Be Used On Large Itemsets.

Cons Of The Apriori Algorithm

Sometimes, It May Need To Find A Large Number Of Candidate Rules Which Can Be Computationally

Expensive.

Calculating Support Is Also Expensive Because It Has To Go Through The Entire Database.

Thank you!

Contact:

Cyrus Lentin

cyrus@lentins.co.in

+91-98200-94236

- Athipathy Data Min InUploaded byathipathy
- 9Uploaded byklchoudhary
- Association AnalysisUploaded byjithender
- Weka TutorialUploaded byabhishekbehal5012
- 10.1.1.72.5582.pdfUploaded byDragos Popescu
- Data Mining TechniquesUploaded byPuneet Gupta
- MSc Business AnalyticsUploaded byManishJha
- IRJET-Frequent Pattern Mining Over Data Stream Using Compact Sliding Window Tree & Sliding Window ModelUploaded byIRJET Journal
- Friendbook: An Advanced Friend Recommendation System For Social Networking sitesUploaded byEditor IJRITCC
- Big Data Sample ChapterUploaded byWiley_Business_Books
- Analysis of Sampling Techniques for Association Rule MiningUploaded byNicolas Badaro
- 6825460Uploaded byA.J Khan
- ElixirUploaded byAnonymous TxPyX8c
- A Study of Sales Prediction Analysis in a Business Organization using Data Mining TechniqueUploaded byEditor IJRITCC
- Alteryx Hadoop Whitepaper Final1Uploaded bynivas_mech
- JOB JOBUploaded byGLOBAL TEQ
- IITH Executive MTech BrochureUploaded bykishore13
- access_control.pdfUploaded byhayden Holfth
- Fullerton PPTUploaded bySuman Mandal
- Analytics Retail Banking Why HowUploaded byOluwatobiAdewale
- Anu Kharwal BAUploaded bySukanta Jana
- 2017%2Feur%2Fpdf%2FPSOGEN-1800.pdfUploaded bySayaOtanashi
- 1-Jigsaw Academy Foundation AnalyticsUploaded byNarakatla Srinu
- Capstone Final GA1e1Uploaded bysandeep9210357335
- ZZM12352USENUploaded byMatthew Harding
- Business Analyst CourseUploaded byAshraf Ahmed
- Business Intelligence and Analytics Research DirectionsUploaded byMounir Hallab
- access_control.pdfUploaded byChristen Castillo
- Fintelekt Insurance InterviewUploaded byamiyagupta
- BiG DataUploaded byravi kumar

- AUDIT.txtUploaded bySiddharth Singhal
- terr_911_grossi_2009Uploaded bySiddharth Singhal
- Gruber's Complete GRE Guide 2012Uploaded bynaveen_m01
- 11 12 BDT DataVisualizationUploaded bySiddharth Singhal
- changes.txtUploaded bySiddharth Singhal
- SimonBusiness_School_Full-Time_MS_Viewbook.pdfUploaded bySiddharth Singhal
- 15-16-BDT-Classification-BC-DT-RF.pdfUploaded bySiddharth Singhal
- 20-BDT-PageRank.pdfUploaded bySiddharth Singhal
- 13-14-BDT-DataMining&MachineLearning.pdfUploaded bySiddharth Singhal
- 01 BDA Hadoop OverviewUploaded bySiddharth Singhal
- 01 BDA Hadoop OverviewUploaded bySiddharth Singhal
- ISO-OSIUploaded byChandu Jagan Sekhar

- DM AssignmentsUploaded byKaustubh Khare
- Vialardi Bravo Ortigosa UM07Uploaded byRakesh Edam
- 07A70503-DATAWAREHOUSINGANDDATAMININGUploaded bySravani Sravz
- Turban Dss9e Ch05Uploaded byDaniHasan
- R7410503 Data Warehousing & Data MiningUploaded bysivabharathamurthy
- Literature Survey TutorialUploaded bykamranali
- A Simple Dynamic Mind-map Framework to Discover Associative Relationships in Transactional Data StreamsUploaded byAhmar Taufik Safaat
- Mobile MinerUploaded byKomal Rane
- Data Mining Lab ManualUploaded byrajianand2
- NSE BA Sample Paper With SolutionUploaded bySanjay Singh
- Retail Inventory Management With Purchase DependenciesUploaded byNasir Ali
- Market Basket Analysis for data mining - msthesis.pdfUploaded byRicardo Zonta Santos
- An Empirical Algorithm for High and Low Correlative Association Rule MiningUploaded byJexia
- UNIT- 3 data mining pt.pptxUploaded bynehakncbcips1990
- 13162.pdfUploaded byAnonymous gUySMcpSq
- Ppt Data MiningUploaded byEko Andi
- Paper-16_A Study & Emergence of Temporal Association Mining Similarity-ProfiledUploaded byRachel Wheeler
- ch6Uploaded byyashwanthr3
- Survey on Temporal Data MiningUploaded byKienking Khansultan
- Diabetes Research.pdfUploaded byumaj25
- Churn analysis in Wireless IndustryUploaded byafzal.sayeed7051
- CHAPTER TWO2017.docxUploaded bySulieman Khalifa Arafa
- Web Usage Mining for analysising eldery people careUploaded byMuthia Bianda
- 01. IntroductionOverview of Data MiningUploaded byFebrian Arsy
- Advance DatabasesUploaded bySaket Raut
- Test BankUploaded byRodel D Dosano
- Patterns From Crime Data Using Candidate Generation ApproachUploaded byInternational Journal of Innovative Science and Research Technology
- Mining Sequential PatternsUploaded byApoorva
- [IJCST-V4I4P28]:Mrs. M.Kavitha, Ms.S.T.Tamil SelviUploaded byEighthSenseGroup
- Data Mining - ClassificationUploaded byLeah Rosagas Belaya