I R Assignment 1

Uploaded by

Arghya Adhya

0% found this document useful (0 votes)

48 views2 pages

computer science-IR

Original Title

i r Assignment 1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

computer science-IR

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

48 views2 pages

I R Assignment 1

Uploaded by

Arghya Adhya

computer science-IR

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

CS60092: Information Retrieval

Autumn 2016, Assignment 1

Motivation: This assignment is to give you a hands on feel about a simple Information Retrieval
system.
Task:
You are given a dataset consisting of the following:
Documents
Queries
Documents relevant to the queries.
You have to implement and compare the following systems for boolean query processing:
Grep based.
Index based.
Lucene based (well known indexing tool).
Report performance metrics (Precision and Recall) and total time for searching all queries for
both the techniques.
You can issue and queries with all the search terms.
Using the dataset provided :
a) Use grep to find the result to the sample queries given. Time each execution of grep
search and make a record.
b) Develop an inverted index - dictionary and postings list using standard data structures in
Java (Hashmaps, ArrayList) or Python(Dictionary, Json Formats, List). You can
choose to tokenize and stem / lemmatize the data. In python use NLTK 3 libraries
(http://www.nltk.org/install.html) (NLTK Book -- http://www.nltk.org/book) or CoreNLP
libraries 3.6 in Java (http://stanfordnlp.github.io/CoreNLP/download.html). Develop
solution for simple conjunctive/disjunctive queries. Run on the queryset given. Tabulate
the speedup of search against the aforementioned grep usage. Also calculate precision
and recall for the given queryset.
c) Build an inverted index using Lucene (Java) - https://lucene.apache.org/
or PyLucene(Python) -https://lucene.apache.org/pylucene/install.html
or Elasticsearch(Python) - https://pypi.python.org/pypi/elasticsearch.
Now again tabulate the speed as well as Precision/Recall and compare with the previous two
approaches.

Dataset description:
All necessary data is available at:
https://www.dropbox.com/sh/1zfuw3xuhmul3d2/AAAuYd2GevTnvSgUK2YK8CvRa?dl=0
The folder Assignment1 contains query.txt, output.txt, alldocs.rar.
1. query.txt contains total 82 queries, which has 2 columns query id and query.
2. alldocs.rar contains documents file named with doc id. Each document has set of sentences.
3. output.txt contains top 50 relevant documents (doc id) for each query.

Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
TensorFlow 2.x in the Colaboratory Cloud: An Introduction to Deep Learning on Google’s Cloud Service
From Everand
TensorFlow 2.x in the Colaboratory Cloud: An Introduction to Deep Learning on Google’s Cloud Service
David Paper
No ratings yet
Assignment 1
Document2 pages
Assignment 1
Gourab Patro
No ratings yet
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
Document3 pages
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
Fgv
No ratings yet
2ObsPy - What Can It Do For Data Centers and Observatories
Document12 pages
2ObsPy - What Can It Do For Data Centers and Observatories
Macarena Cárcamo
No ratings yet
MSE622 Search Algorithm Laboratory Activity
Document4 pages
MSE622 Search Algorithm Laboratory Activity
Arthur Aguilar
No ratings yet
Lab13 Sorting PDF
Document4 pages
Lab13 Sorting PDF
s_gamal15
No ratings yet
PDS Exp 4 To 6
Document9 pages
PDS Exp 4 To 6
X
No ratings yet
Pcap Lib
Document4 pages
Pcap Lib
B20DCAT088 Lưu Văn Hưng
No ratings yet
Operating Systems Lab Manual JNTU
Document9 pages
Operating Systems Lab Manual JNTU
mannanabdulsattar
100% (1)
E4 DS203 2023 Sem2
Document2 pages
E4 DS203 2023 Sem2
sparee1256
No ratings yet
Distributed Cluster Pruning in Hadoop: M.Sc. Project Report
Document48 pages
Distributed Cluster Pruning in Hadoop: M.Sc. Project Report
abhisheks_492
No ratings yet
Pre COBOL Test
Document5 pages
Pre COBOL Test
jsmani
No ratings yet
CSCE 120: Learning To Code: Organizing Data I Hacktivity 12.1
Document3 pages
CSCE 120: Learning To Code: Organizing Data I Hacktivity 12.1
s_gamal15
No ratings yet
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
Document4 pages
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
pranav
No ratings yet
BDA LabCompendium
Document6 pages
BDA LabCompendium
KiS Mint
No ratings yet
Exp7 11 Data Science
Document23 pages
Exp7 11 Data Science
Nikhil Ranjan 211
No ratings yet
Maharana Pratap Engineering College: Computer Science and Engineering
Document14 pages
Maharana Pratap Engineering College: Computer Science and Engineering
Nivedita Singh
No ratings yet
Splunk and MapReduce
Document8 pages
Splunk and MapReduce
Motok
No ratings yet
76-Article Text-107-2-10-20180521
Document4 pages
76-Article Text-107-2-10-20180521
dpkmds
No ratings yet
Speeding Up Your Analysis With Distributed Computing
Document11 pages
Speeding Up Your Analysis With Distributed Computing
Cassia Lmt
No ratings yet
2022 - Java vs. Python - A Comparison of Machine Learning Libraries
Document11 pages
2022 - Java vs. Python - A Comparison of Machine Learning Libraries
黃南榮
No ratings yet
Lab - Manual FDS
Document12 pages
Lab - Manual FDS
Prabha K
No ratings yet
Intelligent Information Retrieval From The Web
Document4 pages
Intelligent Information Retrieval From The Web
Rajeev Prithyani
No ratings yet
M.Tech - Dissertation Presentation
Document28 pages
M.Tech - Dissertation Presentation
mkiran02
No ratings yet
Elastic Search
Document18 pages
Elastic Search
Mihai Ilie
0% (1)
Quakecore Opensees Training Workshop 2016: Opensees in Parallel and On HPC Resources
Document24 pages
Quakecore Opensees Training Workshop 2016: Opensees in Parallel and On HPC Resources
Abhishek Chaurasiya
No ratings yet
0/1 Knapsack Algorithm Comparison: Jacek Dzikowski Illinois Institute of Technology E-Mail: Dzikjac@iit - Edu
Document15 pages
0/1 Knapsack Algorithm Comparison: Jacek Dzikowski Illinois Institute of Technology E-Mail: Dzikjac@iit - Edu
skimdad
No ratings yet
Abstract: Resume:: Categoriesandsubjectdescriptors: Generalterms: Keywords
Document2 pages
Abstract: Resume:: Categoriesandsubjectdescriptors: Generalterms: Keywords
berramla27
No ratings yet
Main PART PDF
Document46 pages
Main PART PDF
Ishan Patwal
No ratings yet
Ada Ue213051 1
Document30 pages
Ada Ue213051 1
Amanat Singh
No ratings yet
A1 Engine Search
Document2 pages
A1 Engine Search
Munthitra Thadthapong
No ratings yet
Topic Analysis Presentation
Document23 pages
Topic Analysis Presentation
Nader AlFakeeh
No ratings yet
Mahout Tutorial
Document9 pages
Mahout Tutorial
Ahmed
No ratings yet
Final Project Report
Document34 pages
Final Project Report
Sahadev Marik
No ratings yet
Clojure Data Analysis Cookbook Second Edition Sample Chapter
Document42 pages
Clojure Data Analysis Cookbook Second Edition Sample Chapter
Packt Publishing
No ratings yet
Benchmarking Warehouse Workloads On The Data Lake Using Presto
Document13 pages
Benchmarking Warehouse Workloads On The Data Lake Using Presto
wilhelmjung
No ratings yet
CrateDB Vs PostgreSQL - Benchmark
Document11 pages
CrateDB Vs PostgreSQL - Benchmark
Sanane
No ratings yet
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
Document18 pages
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
Felipe Albrecht
No ratings yet
Quiz 3
Document1 page
Quiz 3
azatabdirashituly
No ratings yet
ELL784/AIP701: Assignment 3: Instructions
Document3 pages
ELL784/AIP701: Assignment 3: Instructions
lovlesh roy
No ratings yet
FlowPeaks Guide
Document8 pages
FlowPeaks Guide
Kavan Koh
No ratings yet
Hkbu Thesis Format
Document5 pages
Hkbu Thesis Format
ihbjtphig
100% (2)
DataIntensive Computer
Document10 pages
DataIntensive Computer
swapnil_022
No ratings yet
HSDP Tutorial 1 Prototyping Analytic Solution
Document7 pages
HSDP Tutorial 1 Prototyping Analytic Solution
Dinesh Zino
No ratings yet
PDS Exp 1 To 3
Document17 pages
PDS Exp 1 To 3
X
No ratings yet
CS553 Homework #5: Sort On Single Shared Memory Node
Document3 pages
CS553 Homework #5: Sort On Single Shared Memory Node
Hariharan Shankar
No ratings yet
VTK SMP
Document20 pages
VTK SMP
Don Patrox
No ratings yet
ML Lab Manual
Document38 pages
ML Lab Manual
Rahul
No ratings yet
Data Structures and Algorithms
Document6 pages
Data Structures and Algorithms
suryaniraj79
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
Document55 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
Jayesh bansal
No ratings yet
DWDN Lab
Document7 pages
DWDN Lab
gswapna51
No ratings yet
Optimized Schedule Generation With Matrices and Priority Queue in Python
Document13 pages
Optimized Schedule Generation With Matrices and Priority Queue in Python
IamVeryInnocent
No ratings yet
Background: 1.1 General Introduction
Document19 pages
Background: 1.1 General Introduction
dimtry
No ratings yet
CS3361 - Data Science Laboratory
Document31 pages
CS3361 - Data Science Laboratory
paranjothi karthik
No ratings yet
Enhancement of Hakak's Split-Based Searching Algorithm Through Multiprocessing
Document5 pages
Enhancement of Hakak's Split-Based Searching Algorithm Through Multiprocessing
International Journal of Innovative Science and Research Technology
No ratings yet
Perfect Hash Table-Based Telephone Directory
Document62 pages
Perfect Hash Table-Based Telephone Directory
Trinath Basu Miriyala
100% (2)
Genomic Data Preprocessing Through Different Libraries
Document30 pages
Genomic Data Preprocessing Through Different Libraries
sabir ali
No ratings yet
10 5445ir1000085031
Document396 pages
10 5445ir1000085031
annjel
No ratings yet
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Actualizar Dspace 5 A 6
Document8 pages
Actualizar Dspace 5 A 6
Miguel AH
No ratings yet
A Simple PDF
Document2 pages
A Simple PDF
Jaheer Makal
No ratings yet
Bayesclassifier PP
Document1 page
Bayesclassifier PP
Arghya Adhya
No ratings yet
Research Interest (30 Words) - Currently I Am Doing MS in IIT Kharagpur. My Research
Document1 page
Research Interest (30 Words) - Currently I Am Doing MS in IIT Kharagpur. My Research
Arghya Adhya
No ratings yet
PPPT by Ja On Claims Nird July 2012
Document23 pages
PPPT by Ja On Claims Nird July 2012
Arghya Adhya
No ratings yet
Conceptlearning PP
Document2 pages
Conceptlearning PP
Arghya Adhya
No ratings yet
RFP Website Maintenance October 2013
Document4 pages
RFP Website Maintenance October 2013
Arghya Adhya
No ratings yet
IBM AIX Backup Procedures
Document5 pages
IBM AIX Backup Procedures
Arghya Adhya
No ratings yet
MY Story
Document1 page
MY Story
Arghya Adhya
No ratings yet
MY Cover Letter
Document1 page
MY Cover Letter
Arghya Adhya
No ratings yet