Similarities measures

© All Rights Reserved

15 views

Similarities measures

© All Rights Reserved

- Power laws in economics and finance: some ideas from physics
- Topic 5-Indices Surds Logarithms
- 7-3 Laws of Logarithms
- Recalculation and Extension of the Modulus and of the Logarithms of 2, 3, 5, 7 and 17
- Study Guide
- QAT110701_Sol.doc
- A Mollification Regularization Method for Stable Analytic Continuation (81(2011)1593-1608)
- Ad Math
- Mathematics T 1_scheme
- sa-ad-02.pdf
- cryptolog_17
- Goal Programming 2013
- Yokogawa FX1000
- De Complete
- Topic10_DTMC_LimitingDistribution
- practicalcurvetr00duncuoft_bw
- Gamma Function
- Using Logarithmic Graphs
- Developing Idf Arab Saudi
- Application of DSmT for Land Cover Change Prediction

You are on page 1of 2

_

Cross Validated is a question and Here's how it works:

answer site for people interested in

statistics, machine learning, data

analysis, data mining, and data

visualization. Join them; it only takes a

minute:

Anybody can ask Anybody can The best answers are voted

a question answer up and rise to the top

Join

Can you explain the difference between the Jaccard similarity coefficient and the pointwise mutual information (PMI) measure? It would

be great if you could add a few examples.

ttnphns Moeen MH

31.6k 7 95 248 128 4

1 Answer

These two are quite different. Still, let us try to "bring them to a common denominator", to see

the difference. Both Jaccard and PMI could be extended to a continuous data case, but we'll

observe the primeval binary data case.

Y

1 0

-------

1 | a | b |

X -------

0 | c | d |

-------

a = number of cases on which both X and Y are 1

b = number of cases where X is 1 and Y is 0

c = number of cases where X is 0 and Y is 1

d = number of cases where X and Y are 0

a+b+c+d = n, the number of cases.

a

we know that Jaccard[X, Y ] =

a+b+c

.

P (X,Y )

PMI by Wikipedia definition is PMI[X, Y ] = log .

P (X)P (Y )

Let us first forget about "log" - because Jaccard implies no logarithming. Then plug a,b,c,d

notation into PMI formula to obtain:

a

= = = =

a+b a+c

P (X)P (Y ) (a + b)(a + c) a+b a+c gm[P (X), P (Y )]

n n

n n

where "gm" is geometric mean of the two probabilities, and Ochiai similarity between X and Y

vectors is just another name for cosine similarity in case of binary data: a a

a+c

.

a+b

So, you can see that PMI (without logarithm) is Ochiai coefficient further "normalized" (or I'd

say, de-normalized) by the overall probability of the two-way positive (eventful) data.

But Jaccard and Ochiai are comparable. Both are association measures ranging from 0 to 1.

They differ in the accents they put on the potential discrepancy between frequencies b and c.

I've described it in the answer "Ochiai" above links to. To cite:

Because product (seen in Ochiai) increases weaker than sum (seen in Jaccard) when only

one of the terms grows, Ochiai will be really high only if both of the two proportions

(probabilities) are high, which implies that to be considered similar by Ochiai the two

vectors must share the great shares of their attributes/elements. In short, Ochiai curbs

similarity if b and c are unequal. Jaccard does not.

Community ttnphns

1 31.6k 7 95 248

https://stats.stackexchange.com/questions/256684/jaccard-similarity-coecient-vs-point-wise-mutual-information-coecient/25 1/2

10/1/2017 probability - Jaccard similarity coecient vs. Point-wise mutual information coecient - Cross Validated

https://stats.stackexchange.com/questions/256684/jaccard-similarity-coecient-vs-point-wise-mutual-information-coecient/25 2/2

- Power laws in economics and finance: some ideas from physicsUploaded bytanvach
- Topic 5-Indices Surds LogarithmsUploaded bymaths_w_mr_teh
- 7-3 Laws of LogarithmsUploaded byjsarmientodc
- Recalculation and Extension of the Modulus and of the Logarithms of 2, 3, 5, 7 and 17Uploaded byapi-26401608
- Study GuideUploaded byZach Ferger
- QAT110701_Sol.docUploaded byapi-3756296
- A Mollification Regularization Method for Stable Analytic Continuation (81(2011)1593-1608)Uploaded bymijl435765
- Ad MathUploaded byMuhammad Husaini Husaini
- Mathematics T 1_schemeUploaded bySyafiq Jamil
- sa-ad-02.pdfUploaded byap44us
- cryptolog_17Uploaded byJohn Ohno
- Goal Programming 2013Uploaded byPrateek Rao
- Yokogawa FX1000Uploaded byjuma1987
- De CompleteUploaded byMartensit Jersey
- Topic10_DTMC_LimitingDistributionUploaded byJane Hanger
- practicalcurvetr00duncuoft_bwUploaded byRalf Sander
- Gamma FunctionUploaded byMarcos Garcia Garcia
- Using Logarithmic GraphsUploaded byAC
- Developing Idf Arab SaudiUploaded byFuad Hasan
- Application of DSmT for Land Cover Change PredictionUploaded byMia Amalia
- CRE LECTURE (07-09-2015)(1)Uploaded bySalim Chohan
- Sample Functions and Macros 08Uploaded byCésar Cristóbal Pino Guzmán
- Danish Javed_3405403_Frameworks Project ReportUploaded byDanish Javed
- m242Lab1Uploaded bynawrami
- GA Lab ReportUploaded byHw Haha
- 264_3Uploaded byDaniel
- pc_5_2014_bahadori_251Uploaded byWilder Ganoza
- 1735-4859-1-PBUploaded byClaraDimetriMeliala
- Theradialflowregimeisthemoststrategicofalltheflowregimesbecauseitenablesestimationofthe permeabilityparalleltothebeddingplane.docxUploaded bymuklis anggara
- 356789005-General-Mathematics-First-Quarter-Exam-1.docxUploaded byeliana tm31

- ontolog-social-web-keynote.pdfUploaded byAngelRibeiro10
- You Only Look Once - Unified, Real-Time Object Detection ( J Redmon Et Al 2016)Uploaded byDavid Budaghyan
- CompositionUploaded byAngelRibeiro10
- The Polyglot ProjectUploaded byroutrhead
- Ontolog Social Web KeynoteUploaded byAngelRibeiro10
- ML InterperatabilityUploaded byrcoca_1
- A Survey of Heterogeneous Information Network AnalysisUploaded byAngelRibeiro10
- Biopython_Tutorial.pdfUploaded byAngelRibeiro10
- Language, Music and Computing - Mitrenina, Eds - 2019.pdfUploaded byAngelRibeiro10
- Guide to Unconventional Computing for MusicUploaded bySonnenschein
- Fundamentals of Algorithmics Brassard InglesUploaded byTusharVatsa
- egc2013_tutoriel_MissaouiUploaded byAngelRibeiro10
- curso_grafos_handout201009Uploaded byAngelRibeiro10
- Review Text BasedUploaded byAngelRibeiro10
- Overlap Coefficient - WikipediaUploaded byAngelRibeiro10
- inplementar.pdfUploaded byAngelRibeiro10
- Text MiningUploaded byAngelRibeiro10
- Redes Complexas 2Uploaded byAngelRibeiro10
- book_270.pdfUploaded bygerman2210
- Sound LabUploaded byAngelRibeiro10
- How to Use the Hungarian Algorithm_ 10 Steps (With Pictures)Uploaded byAngelRibeiro10
- Introduction to Computer Programming With MATLABUploaded byAngelRibeiro10
- acustica.txtUploaded byAngelRibeiro10
- jumping-nlp-curves.pdfUploaded byAngelRibeiro10
- natural language processingUploaded byAngelRibeiro10
- Programa Escola RCUploaded byAngelRibeiro10
- Beethoven's Letters. (1790--1826.) Vol. iUploaded byAngelRibeiro10
- Gary OldmanUploaded byAngelRibeiro10

- Inventive Cubic Symmetric Encryption System for MultimediaUploaded byCS & IT
- 200 LED Reverse Forward Light Chaser Circuit - For Diwali, Christmas DecorationsUploaded byManishDikshit
- Damian Sutton, David Martin-Jones - Deleuze Reframed.pdfUploaded byAlexandra Ilina
- C.B., N.B vs. Valve Corporation, CSGO Lotto, Trevor Martin, and Thomas CassellUploaded byJacob Wolf
- Ima Henning ButzUploaded byRichard Raflai
- Encrypted Information Hiding Technique Using BPCS SteganographyUploaded byInnovative Research Publications
- Mobile ComputingUploaded bypinakinayak
- Chapter 1 SolutionUploaded byVandara Kam
- Short History of Software MethodsUploaded byDedek
- Sc73c0302 4-Bit Microcontroller Per Remote ControllerUploaded byfreznell
- ford mondeo mk4Uploaded byVulpe CătăRalu
- TH LH5565 Usermanual e RevDUploaded bydflorin699430
- Topic 4 - Actuated Signal OperationUploaded byKhairul Anuar
- Rad Appendix 07Uploaded byLe Thang
- Lec 07 Superelement NAS105Uploaded bySrinivas Rallabandi
- FFIEC ITBooklet Information SecurityUploaded byahong100
- Categorical Data Analysis With SAS and SPSS ApplicationsUploaded byyas
- BVP B-CAT 2015 Information BrochureUploaded byNeepur Garg
- ForecastingUploaded byA K
- AA Metatron The Alchemy of Crystalline Phi Merlin's Wand.docxUploaded byMeaghan Mathews
- EVIOTA vs CA Case DigestUploaded byElize Pascua
- [eBook - PDF - 3D - 1999] Deep Paint Tutorial for 3Ds MAXUploaded byDodeptrai Bk
- AWS InterviewUploaded byomkar
- Basic EE experiment 4.docxUploaded byAnonymous D9KWDuxd2d
- Rj21 Telco 50 Cable and Patch Panel Usage GuideUploaded byDearlove Kasambira
- PG_MX-2600N-3100NUploaded byBelen M Pazmiño
- Twitter in K-8 Classroom- Globally Connected LearningUploaded bySilvia Rosenthal Tolisano
- p3_dspUploaded byBhuvan Gupta
- [IJCST-V6I5P16]:Zain Amjed, Akmal RehanUploaded byEighthSenseGroup
- Extended Downtime ChecklistUploaded bysigmasundar

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.