L6 - SCORING-EXAMPLE (Most Important Part of Vector Space Model Exercise)

Uploaded by

Bemenet Biniyam

0% found this document useful (0 votes)

12 views4 pages

Original Title

L6_SCORING-EXAMPLE [ Most important part of Vector Space Model Exercise ]

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

12 views4 pages

L6 - SCORING-EXAMPLE (Most Important Part of Vector Space Model Exercise)

Uploaded by

Bemenet Biniyam

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Vector Space Example

Suppose we query an IR system for the query "gold silver truck". The database collection
consists of three documents (D = 3) with the following content

D1: "Shipment of gold damaged in a fire"

D2: "Delivery of silver arrived in a silver truck"
D3: "Shipment of gold arrived in a truck"
Retrieval results are summarized in the following table.

1. Columns 1 - 5: First, we construct an index of terms from the documents and determine the term
counts tfi for the query and each document Dj.
2. Columns 6 - 8: Second, we compute the document frequency di for each document. Since IDFi =
log(D/dfi) and D = 3, this calculation is straightforward.
3. Columns 9 - 12: Third, we take the tf*IDF products and compute the term weights. These
columns can be viewed as a sparse matrix in which most entries are zero.
Now we treat weights as coordinates in the vector space, effectively representing documents
and the query as vectors.
Similarity Analysis
First for each document and query, we compute all vector lengths (zero terms ignored)

Next, we compute all dot products (zero products ignored)

Now we calculate the similarity values

Finally we sort and rank the documents in descending order according to the similarity values

Rank 1: Doc 2 = 0.8246

Rank 2: Doc 3 = 0.3271
Rank 3: Doc 1 = 0.0801

Observations
This example illustrates several facts. First, that very frequent terms such as "a", "in", and "of"
tend to receive a low weight -a value of zero in this case. Thus, the model correctly predicts
that very common terms, occurring in many documents in a collection are not good
discriminators of relevancy. Note that this reasoning is based on global information; ie., the
IDF term. Precisely, this is why this model is better than the term count model discussed in
Part 2. Third, that instead of calculating individual vector lengths and dot products we can
save computational time by applying directly the similarity function

Eq 3:
Of course, we still need to know individual tf and IDF values.

Modern Introduction to Object Oriented Programming for Prospective Developers
From Everand
Modern Introduction to Object Oriented Programming for Prospective Developers
Fabian Gebert
No ratings yet
4-IR Models
Document33 pages
4-IR Models
halal.army07
No ratings yet
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Unit 8
Document24 pages
Unit 8
badboy_rockss
0% (1)
String Algorithms in C: Efficient Text Representation and Search
From Everand
String Algorithms in C: Efficient Text Representation and Search
Thomas Mailund
No ratings yet
Data Types and Structures
Document13 pages
Data Types and Structures
Decklin
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
IR Endsem Solution1
Document17 pages
IR Endsem Solution1
Dash Casper
No ratings yet
Chapter 4 IR Models
Document43 pages
Chapter 4 IR Models
Tolosa Tafese
No ratings yet
Stanford University Computer Science Department Solved CS347 Spring 2001 Mid-Term
Document5 pages
Stanford University Computer Science Department Solved CS347 Spring 2001 Mid-Term
Erwan Setiawan
No ratings yet
Solved-Midterm Sistem Temu Kembali
Document5 pages
Solved-Midterm Sistem Temu Kembali
Erwan Setiawan
No ratings yet
Chapter Nine Sorting and Searching
Document30 pages
Chapter Nine Sorting and Searching
tbijle
0% (1)
Ex. No.: Text Mining On Commercial Application Date: Motivation
Document9 pages
Ex. No.: Text Mining On Commercial Application Date: Motivation
RamkumardevendiranDeven
No ratings yet
Proceedings of The DATA COMPRESSION CONFERENCE (DCC'02) 1068-0314/02 $17.00 © 2002 IEEE
Document10 pages
Proceedings of The DATA COMPRESSION CONFERENCE (DCC'02) 1068-0314/02 $17.00 © 2002 IEEE
Anonymous RrGVQj
No ratings yet
6-Query Languages
Document19 pages
6-Query Languages
Samuel Ketema
No ratings yet
Project1 - Cold Storage Case Study
Document11 pages
Project1 - Cold Storage Case Study
Shreya Garg
No ratings yet
Data Structure Notes
Document171 pages
Data Structure Notes
kavirajee
No ratings yet
IR Chap4
Document32 pages
IR Chap4
biniam teshome
No ratings yet
IR Chap4
Document32 pages
IR Chap4
biniam teshome
No ratings yet
Baka
Document85 pages
Baka
M Narendran
No ratings yet
D Practical File 2
Document67 pages
D Practical File 2
Abcd
No ratings yet
Lab - Activity-Iii: ST ND
Document9 pages
Lab - Activity-Iii: ST ND
Palanisamy V
No ratings yet
Tempor 1
Document90 pages
Tempor 1
shailzakanwar
No ratings yet
2.3 Asymptotic Analysis
Document12 pages
2.3 Asymptotic Analysis
pullasunil
No ratings yet
Latent Semantic Indexing (LSI) An Example
Document4 pages
Latent Semantic Indexing (LSI) An Example
Albert Machado
No ratings yet
Vector Space Model: TF - IDF: Adapted From Lectures by
Document37 pages
Vector Space Model: TF - IDF: Adapted From Lectures by
Jati Hamengku Gati
No ratings yet
A New Framework For Addressing Temporal Range Queries and Some Preliminary Results
Document14 pages
A New Framework For Addressing Temporal Range Queries and Some Preliminary Results
postscript
No ratings yet
DS Assignment 01
Document6 pages
DS Assignment 01
Qura tul Ain
No ratings yet
09 - Chapter 4
Document10 pages
09 - Chapter 4
priyanka
No ratings yet
Useful Algorithms and Programming Technique
Document9 pages
Useful Algorithms and Programming Technique
Rahul Jain
No ratings yet
Sistem Temu Kembali
Document6 pages
Sistem Temu Kembali
Erwan Setiawan
No ratings yet
T1 PDF
Document2 pages
T1 PDF
Kriti Goyal
No ratings yet
UNIT-4 Part-B: 1) Briefly Described About 1-D Time Series and The 2-D Color Images and With Suitable Examples?
Document11 pages
UNIT-4 Part-B: 1) Briefly Described About 1-D Time Series and The 2-D Color Images and With Suitable Examples?
chandral joshi
No ratings yet
Text Mining With R
Document15 pages
Text Mining With R
SumitKishore
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
Document20 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
BlackMooth
No ratings yet
Chap12 1473751047 598110
Document78 pages
Chap12 1473751047 598110
Đức Nguyễn
No ratings yet
062 Teorija Informacije Aritmeticko Kodiranje
Document22 pages
062 Teorija Informacije Aritmeticko Kodiranje
AidaVeispahic
No ratings yet
CS341 Programming Languages Assignment #2 (ML)
Document4 pages
CS341 Programming Languages Assignment #2 (ML)
sivan english
No ratings yet
CSC 337 Lecture Notes
Document74 pages
CSC 337 Lecture Notes
Adenrele Abolanle Afolorunso
No ratings yet
08 - Chapter 2
Document10 pages
08 - Chapter 2
priyanka
No ratings yet
Batch Codes and Their Applications: Yuval Ishai Eyal Kushilevitz Rafail Ostrovsky Amit Sahai
Document27 pages
Batch Codes and Their Applications: Yuval Ishai Eyal Kushilevitz Rafail Ostrovsky Amit Sahai
Jeevan Landge Patil
No ratings yet
Handout 2
Document15 pages
Handout 2
Satyanarayana Areti
No ratings yet
Datalog and Recursion
Document144 pages
Datalog and Recursion
verena.kantere
No ratings yet
Data Structure - UNIT-1
Document28 pages
Data Structure - UNIT-1
Sriram Sekar
No ratings yet
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Document33 pages
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Diantika Ochan Puspitasari
No ratings yet
Subex Linear Programming
Document19 pages
Subex Linear Programming
kokoro
No ratings yet
Language Constructs - Struct C
Document3 pages
Language Constructs - Struct C
Md. Salman Azad
No ratings yet
An Algorithm To Simulate The Temporal Database Using Exponential Distribution
Document5 pages
An Algorithm To Simulate The Temporal Database Using Exponential Distribution
International Organization of Scientific Research (IOSR)
No ratings yet
Dijkstra's Algorithm With Fibonacci Heaps: An Executable Description in CHR
Document10 pages
Dijkstra's Algorithm With Fibonacci Heaps: An Executable Description in CHR
Tomislav Rupcic
No ratings yet
03 Assign Sol
Document12 pages
03 Assign Sol
Nidhi Garg
0% (1)
Time Series Analysis With R - Part I
Document23 pages
Time Series Analysis With R - Part I
thcm2011
No ratings yet
Splay Trees
Document44 pages
Splay Trees
Kwasi Awuah
No ratings yet
Map Reduce
Document5 pages
Map Reduce
kathirdcn
No ratings yet
Queries As Vectors
Document3 pages
Queries As Vectors
Stefan Ilie
No ratings yet
Preferential Infinitesimals For Information Retrieval: Maria Chowdhury Alex Thomo William Wadge
Document34 pages
Preferential Infinitesimals For Information Retrieval: Maria Chowdhury Alex Thomo William Wadge
khushi maheshwari
No ratings yet
Realistic Synthetic Data
Document4 pages
Realistic Synthetic Data
Kurumeti Naga Surya Lakshmana Kumar
0% (1)
Re Sizable Arrays
Document12 pages
Re Sizable Arrays
Seraphina Nix
No ratings yet
IR Models: Chapter Five
Document26 pages
IR Models: Chapter Five
milkikoo shifera
100% (1)
Improve Text Classification Accuracy Based On Classifier Fusion Methods
Document6 pages
Improve Text Classification Accuracy Based On Classifier Fusion Methods
danesh110
No ratings yet
COL216 Assignment 4: 1 Problem Statement
Document4 pages
COL216 Assignment 4: 1 Problem Statement
Aniket Mishra
No ratings yet
Assessing University Students Satisfaction With Service Delivery Implications For Educational Management1
Document14 pages
Assessing University Students Satisfaction With Service Delivery Implications For Educational Management1
Bemenet Biniyam
No ratings yet
Applsci 11 10915 v2
Document17 pages
Applsci 11 10915 v2
Bemenet Biniyam
No ratings yet
An Analysisof Students Satisfaction Using Multicriteria Methods
Document12 pages
An Analysisof Students Satisfaction Using Multicriteria Methods
Bemenet Biniyam
No ratings yet
A05 Language Models
Document19 pages
A05 Language Models
Bemenet Biniyam
No ratings yet
Lecture 5: Evaluation: Information Retrieval Computer Science Tripos Part II
Document58 pages
Lecture 5: Evaluation: Information Retrieval Computer Science Tripos Part II
Bemenet Biniyam
No ratings yet
Lecture5 Language Models
Document68 pages
Lecture5 Language Models
Bemenet Biniyam
100% (2)
Introduction To: Information Retrieval
Document38 pages
Introduction To: Information Retrieval
Bemenet Biniyam
No ratings yet
Retrieval Using Language Model
Document12 pages
Retrieval Using Language Model
Bemenet Biniyam
No ratings yet
Lecture 4: Term Weighting and The Vector Space Model: Information Retrieval Computer Science Tripos Part II
Document62 pages
Lecture 4: Term Weighting and The Vector Space Model: Information Retrieval Computer Science Tripos Part II
Bemenet Biniyam
No ratings yet
Chapter Five IR Models
Document28 pages
Chapter Five IR Models
Bemenet Biniyam
No ratings yet
Dark Data: Why What You Don’t Know Matters
From Everand
Dark Data: Why What You Don’t Know Matters
David J. Hand
Rating: 4.5 out of 5 stars
4.5/5 (3)
Microsoft Access Guide for Success
From Everand
Microsoft Access Guide for Success
Kevin Pitch
Rating: 5 out of 5 stars
5/5 (2)
Optimizing DAX: Improving DAX performance in Microsoft Power BI and Analysis Services
From Everand
Optimizing DAX: Improving DAX performance in Microsoft Power BI and Analysis Services
Alberto Ferrari
No ratings yet
Blockchain Basics: A Non-Technical Introduction in 25 Steps
From Everand
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Daniel Drescher
Rating: 4.5 out of 5 stars
4.5/5 (24)
Grokking Algorithms: An illustrated guide for programmers and other curious people
From Everand
Grokking Algorithms: An illustrated guide for programmers and other curious people
Aditya Bhargava
Rating: 4 out of 5 stars
4/5 (16)
Sql Injection Best Method for Begineers
From Everand
Sql Injection Best Method for Begineers
Kishor Sarkar X
No ratings yet
Access 2019 For Dummies
From Everand
Access 2019 For Dummies
Laurie A. Ulrich
No ratings yet
Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future
From Everand
Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future
Vijay Govindarajan
No ratings yet
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
From Everand
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Walter Shields
Rating: 4.5 out of 5 stars
4.5/5 (46)
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
Rating: 5 out of 5 stars
5/5 (5)
Learn SAP SD in 24 Hours
From Everand
Learn SAP SD in 24 Hours
Alex Nordeen
No ratings yet
ITIL 4: High-velocity IT: Reference and study guide
From Everand
ITIL 4: High-velocity IT: Reference and study guide
Mark Smalley
No ratings yet
Database Design: Know It All
From Everand
Database Design: Know It All
Toby J. Teorey
Rating: 5 out of 5 stars
5/5 (2)
Practical Data Analysis
From Everand
Practical Data Analysis
Hector Cuesta
Rating: 4.5 out of 5 stars
4.5/5 (14)
Excel 2021
From Everand
Excel 2021
JIAYI SIMONDS
Rating: 4 out of 5 stars
4/5 (11)
A Concise Guide to Object Orientated Programming
From Everand
A Concise Guide to Object Orientated Programming
alasdair gilchrist
No ratings yet
Agile Metrics in Action: How to measure and improve team performance
From Everand
Agile Metrics in Action: How to measure and improve team performance
Christopher Davis
No ratings yet
SQL Server: Tips and Tricks - 2
From Everand
SQL Server: Tips and Tricks - 2
Priyanka Agarwal
Rating: 4.5 out of 5 stars
4.5/5 (3)
FileMaker Pro Design and Scripting For Dummies
From Everand
FileMaker Pro Design and Scripting For Dummies
Timothy Trimble
No ratings yet
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
From Everand
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
Steve Williams
Rating: 5 out of 5 stars
5/5 (5)
Oracle SQL: Jumpstart with Examples
From Everand
Oracle SQL: Jumpstart with Examples
Gavin JT Powell
Rating: 4.5 out of 5 stars
4.5/5 (7)
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
The Future of Competitive Strategy: Unleashing the Power of Data and Digital Ecosystems (Management on the Cutting Edge)
From Everand
The Future of Competitive Strategy: Unleashing the Power of Data and Digital Ecosystems (Management on the Cutting Edge)
Mohan Subramaniam
Rating: 5 out of 5 stars
5/5 (1)
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
From Everand
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Gordon S. Linoff
Rating: 4 out of 5 stars
4/5 (9)
Data Science
From Everand
Data Science
John D. Kelleher
Rating: 4.5 out of 5 stars
4.5/5 (66)
Data Smart: Using Data Science to Transform Information into Insight
From Everand
Data Smart: Using Data Science to Transform Information into Insight
John W. Foreman
Rating: 4.5 out of 5 stars
4.5/5 (23)
Relational Database Design and Implementation
From Everand
Relational Database Design and Implementation
Jan L. Harrington
Rating: 4.5 out of 5 stars
4.5/5 (5)
Crash Course Big Data
From Everand
Crash Course Big Data
IntroBooks Team
No ratings yet
Bioinformatics: Managing Scientific Data
From Everand
Bioinformatics: Managing Scientific Data
Zoé Lacroix
Rating: 2 out of 5 stars
2/5 (2)
Microsoft Access Guide to Success: From Fundamentals to Mastery in Crafting Databases, Optimizing Tasks, & Making Unparalleled Impressions [III EDITION]
From Everand
Microsoft Access Guide to Success: From Fundamentals to Mastery in Crafting Databases, Optimizing Tasks, & Making Unparalleled Impressions [III EDITION]
Kevin Pitch
Rating: 5 out of 5 stars
5/5 (8)