Welcome to Scribd!

Discussion Forum Unit 3

Uploaded by

0% found this document useful (0 votes)

235 views3 pages

The author analyzes the results of an assignment containing 26543 tokens and 9606 unique terms to test if it follows Heap's Law. While Heap's Law predicts around 6517 unique terms, the actual number is 9606, showing a significant difference. However, plugging the values into Heap's Law with k=44 and b=.49 predicts around 7168 terms, closer to the actual results. The author concludes small corpora may not always accurately reflect Heap's Law due to the unpredictability of new terms being added over time.

Original Description:

Discussion Topic

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

235 views3 pages

Discussion Forum Unit 3

Uploaded by

rigan123

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

First, I’m submitting the result of the unit 2 assignment.

Documents 570

Tokens 26543

Terms 9606

From Heaps law we find (Manning, Raghavan, & Schütze, 2009),

M=kTb // Where M= Vocabulary size, k, and b are constant.

From the assignment output, we find

Total terms= 26543

Unique Terms= 9606

Now let’s see whether Heaps law is true for our assignment result:

M= 40*(26543).5 = 6517 approximately. //k=40; b=.5

The difference between 6517 and 9606 is significant. So, we find it doesn’t follow
Heap’s law.

We learned from Heap’s law that out of 1 million terms approximately 38,323 are
unique. We can say, out of 1 million tokens 38,323 are terms. It was represented
as (Manning, Raghavan, & Schütze, 2009):

M= 44* (1000020).49 = 38323 //k=44;b=.49

M= 40* (1000020).5 = 40000 //k=40;b=.50

We find the value slight changes based on the value of k and b. For example, if we
plug k=44 and b=.49 into our equation then

M= 44*(26543).49 = 7168 approximately, more close to our output.

If we use k=59 and b=.50 we find

M= 59*(26543).50 = 9612

9612 is very close to our output.

From the Heaps law, we’ve noticed the value of k varies between 30 and 100
(Manning, Raghavan, & Schütze, 2009). Similarly, the value of b is somehow near
to .5. Heaps law is based on observation. It’s not a pure mathematical output based
on function. If you take more tokens, you’ll get a close result to the Heaps law.
Heaps law has shown with at least 1 million tokens. On the other hand, our corpus
has only 26543 tokens. If your collection number is small, you’ll observe the
unpredictable result. If your collection is high, you’ll observe higher accuracy.
That’s why k and b vary within a range. It’s also important to say that our tokens
are not finite and day by day new tokens are adding to the dictionary. All these are
not pure dictionary based. For example, the word “UoPeople” didn’t exist before
2009. So, testing with large collection will give you an approximately correct value.
But testing with small collection may not give you correct result always.
References

Manning, C. D., Raghavan, P., & Schütze, H. (2009). Heaps’ Law: Estimating The
Number of Terms. In An Introduction to Information Retrieval (pp. 88-89).
Cambridge, England: Cambridge University Press.

Mathproblemspt 2
Document12 pages
Mathproblemspt 2
api-239243343
No ratings yet
Yousef ML Washin Classification
Document333 pages
Yousef ML Washin Classification
yousef shaban
100% (1)
Decibels: History
Document11 pages
Decibels: History
drsureliya
No ratings yet
Surprising Distributions: Khwecoiewematics: Benford's Law and Other
Document12 pages
Surprising Distributions: Khwecoiewematics: Benford's Law and Other
jai bachani
No ratings yet
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
Document9 pages
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
Bálint L'Obasso Tóth
No ratings yet
Xi CH 1
Document32 pages
Xi CH 1
Naisha J
No ratings yet
Class 12: Maxwell-Boltzmann Statistics
Document5 pages
Class 12: Maxwell-Boltzmann Statistics
Steel Being
No ratings yet
Lecture12 PDF
Document5 pages
Lecture12 PDF
Steel Being
No ratings yet
Chapter 1 Real Numbers PDF
Document15 pages
Chapter 1 Real Numbers PDF
heena ali
No ratings yet
MIT18 05S14 Reading6b PDF
Document13 pages
MIT18 05S14 Reading6b PDF
akbarfarraz
No ratings yet
Maths Project
Document5 pages
Maths Project
Uday Kumar
No ratings yet
Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #39 Testing of Hypothesis-VII
Document17 pages
Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #39 Testing of Hypothesis-VII
Kofi Appiah-Danquah
No ratings yet
Lec 31
Document11 pages
Lec 31
shailiayush
No ratings yet
1 13 Optimal Control Proofs
Document9 pages
1 13 Optimal Control Proofs
Fjäll Räven
No ratings yet
Lec 8
Document17 pages
Lec 8
Sharath
No ratings yet
A Puzzle About Sums
Document47 pages
A Puzzle About Sums
Gucci Gang
No ratings yet
Introduction To Algorithmic Analysis
Document19 pages
Introduction To Algorithmic Analysis
Kristin Dikiciyan
No ratings yet
New Tech 122 Module 5
Document9 pages
New Tech 122 Module 5
Jhenny MT
No ratings yet
Lec 19
Document30 pages
Lec 19
shubham
No ratings yet
Lec 15
Document27 pages
Lec 15
shubham
No ratings yet
T8s34aeev - Exponential and Logarithmic Functions
Document26 pages
T8s34aeev - Exponential and Logarithmic Functions
Blessie Lazaro
No ratings yet
Real Analysis Mathematics
Document16 pages
Real Analysis Mathematics
Lee Zhi Kang
No ratings yet
A Tutorial On Principal Componnts Analysis - Lindsay I Smith 7
Document1 page
A Tutorial On Principal Componnts Analysis - Lindsay I Smith 7
Anonymous IN80L4rR
No ratings yet
Chapter 1 Real Numbers Solutions
Document15 pages
Chapter 1 Real Numbers Solutions
Tech Rajesh
100% (1)
Unit 2
Document51 pages
Unit 2
jana k
No ratings yet
Lec 4
Document30 pages
Lec 4
shubham
No ratings yet
Data Structures
Document30 pages
Data Structures
Swati Sukhija
No ratings yet
1.2 - Arithmetic and Geometric Sequences and Series - Blank Notes
Document21 pages
1.2 - Arithmetic and Geometric Sequences and Series - Blank Notes
ld745150
No ratings yet
Problem Set 2 Quantitative Methods UNIGE
Document10 pages
Problem Set 2 Quantitative Methods UNIGE
sancallonandez.95
No ratings yet
W8 Proofs - PDF - 2
Document10 pages
W8 Proofs - PDF - 2
vererar
No ratings yet
Lag Model
Document5 pages
Lag Model
Janhavi Joshi
No ratings yet
How To Perform A Box-Cox Transformation in Python - Statology
Document7 pages
How To Perform A Box-Cox Transformation in Python - Statology
yeshig2000
No ratings yet
EAL Umbers: Download All NCERT Books PDF From WWW - Ncert.online
Document19 pages
EAL Umbers: Download All NCERT Books PDF From WWW - Ncert.online
Saurabh Suman
No ratings yet
LabRefCh10 Power Law and Log
Document10 pages
LabRefCh10 Power Law and Log
Pataki Sandor
No ratings yet
Shows The Results From Running G-Means and X-Means On Many Large Synthetic. On Synthetic Datasets
Document2 pages
Shows The Results From Running G-Means and X-Means On Many Large Synthetic. On Synthetic Datasets
Fatima Bl
No ratings yet
Mimo Antennas Lec2
Document11 pages
Mimo Antennas Lec2
Alberto Saldivar
No ratings yet
Anharmonic Oscillator: Introduction and The Simple Harmonic Oscillator
Document5 pages
Anharmonic Oscillator: Introduction and The Simple Harmonic Oscillator
Gautam Sharma
No ratings yet
Dimensions Text
Document7 pages
Dimensions Text
shambhavi.k
No ratings yet
Differential Calculus Is A Subfield of Calculus
Document10 pages
Differential Calculus Is A Subfield of Calculus
Izyan Mohd Subri
No ratings yet
SPSS Problems Solved
Document15 pages
SPSS Problems Solved
Greeshma
100% (2)
StudyQuestions Regression (Logarithms)
Document21 pages
StudyQuestions Regression (Logarithms)
master8875
No ratings yet
Number Theory PDF
Document14 pages
Number Theory PDF
yogesh kumar
No ratings yet
Lec 11
Document11 pages
Lec 11
shailiayush
No ratings yet
Computational Physics Exam 01: Bradley Andrew
Document4 pages
Computational Physics Exam 01: Bradley Andrew
Bradley Andrew
No ratings yet
Infinitely Nested Radicals
Document8 pages
Infinitely Nested Radicals
00294enksdfa
No ratings yet
Lab PDF
Document50 pages
Lab PDF
serge
No ratings yet
Isocosts Isoquants and Proofs
Document27 pages
Isocosts Isoquants and Proofs
Nikita Ojha
No ratings yet
Course Notes For Unit 3 of The Udacity Course ST101 Introduction To Statistics
Document20 pages
Course Notes For Unit 3 of The Udacity Course ST101 Introduction To Statistics
Iain McCulloch
No ratings yet
Chapter 1: Introduction: Physics (In Latin Alphabet
Document6 pages
Chapter 1: Introduction: Physics (In Latin Alphabet
Sevim Köse
No ratings yet
Probleme
Document5 pages
Probleme
Madalina Andreea Marinescu
No ratings yet
Lec 2
Document11 pages
Lec 2
Shubham Sharma
No ratings yet
Chem 11 - First Assignment Reading Package
Document10 pages
Chem 11 - First Assignment Reading Package
mayankbisht
No ratings yet
Aptitude&Puzzles
Document103 pages
Aptitude&Puzzles
Abu Kashan
No ratings yet
Mole Concept Stoichiometry
Document14 pages
Mole Concept Stoichiometry
tapas kundu
No ratings yet
Statistics Part3 2013
Document25 pages
Statistics Part3 2013
Victor De Paula Vila
No ratings yet
CMS StatisticsForLHCPhysics PDF
Document9 pages
CMS StatisticsForLHCPhysics PDF
Bibhuprasad Mahakud
No ratings yet
Lec 20
Document29 pages
Lec 20
shubham
No ratings yet
SLG 2.3.3 Functions As Models Session 3 of 3
Document5 pages
SLG 2.3.3 Functions As Models Session 3 of 3
Ralph Moses Padilla
No ratings yet
Algebra & Trigonometry II Essentials
From Everand
Algebra & Trigonometry II Essentials
Editors of REA
Rating: 4 out of 5 stars
4/5 (4)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet