You are on page 1of 17

INFORMATION RETRIEVAL

PROJECT REPORT

ON

TREC

SUBMITTED BY

SUBHAYAN CHATTERJEE

Roll No. : 001900803008

MLISc. (DIGITAL LIBRARY)

SESSION: 2019-2021

PAPER CODE: MLDL-05

JADAVPUR UNIVERSITY

DEPARTMENT OF LIBRARY & INFORMATION SCIENCE

KOLKATA 700 032

GUIDED BY

DR. UDAYAN BHATTACHARYYA

1
Acknowledgement

It is great pleasure for me to express my deep sense of gratitude to our respected


professor Prof. Udayan Bhattacharyya, Department of Library &
Information Science, Jadavpur University, for his guidance and important
suggestions during this project work and also thankful to our departmental
Librarian, other staffs, senior students and my friends.

Date: 10.04.2020

Place: Kolkata Subhayan Chatterjee

2
CONTENT

Topic Page No.

 Introduction 4
 Objectives 4
 TREC Facts 4
 Origin of TREC 5
 Goals for the TREC experiments 5
 Information activities of TREC 6
 TREC Tracks 6
 TREC Collections 8
 TREC Topics 9
 Relevance Judgements in TREC 9
 TREC 1 10
 TREC 2 10
 TREC 3-5, 6 11
 TREC 7 11
 TREC 8-12 12
 TREC Yearly Conference Cycle 12
 TREC Publications 13
 Celebrating 25 years of TREC 13
 Evaluation of TREC 14
 Benefits of TREC experiments 15
 Conclusion 16
 References 17

3
INTRODUCTION

The Text Retrieval Conference (TREC) is an ongoing series


of workshops focusing on a list of different information retrieval (IR)
research areas, or tracks. Its purpose was to support research within the
information retrieval community by providing the infrastructure
necessary for large-scale evaluation of text retrieval methodologies. The
Text Retrieval Conference (TREC), co-sponsored by the National Institute
of Standards and Technology (NIST) and U.S. Department of Defense,
was started in 1992 as part of the TIPSTER Text program. Participants
run their programs on the data. Judges evaluate the results. Participants
share their experiences in the conference.

OBJECTIVES

1. Encourages research in text retrieval in large data sets.

2. By creating a “practical” conference, increases communication between


academia, business, and government.

3. Provide availability of many different text retrieval techniques.

TREC FACTS

In 2003, included 93 groups from 22 countries

Makes test collections and submitted retrieval code available to public

First Large Scale:

 Non-English retrieval

 Retrieval of speech recordings

 Retrieval across multiple languages

4
ORIGIN OF TREC

Researchers in information retrieval have based their research on small test


collections like CACM, NPL, and INSPEC collections, each containing from
only a few hundred to few thousand documents. The major problem for the
researchers was to get a test collection large enough to match a real-life
situation with an infrastructure adequate for conducting test on them. In
1991, the US Defence Advanced Research Projects Agency (DARPA),
decided to fund the TREC experiments, to be run by the NIST, in order to
enable information retrieval researchers to scale up from small collections
of data to larger experiments.

Goals for the TREC experiments

1. To encourage research in information retrieval based on large test


collections;

2. To increase communication among industry, academia, and government


by creating an open forum for the exchange of research ideas;

3. To speed the transfer of technology from research labs into commercial


products by demonstrating substantial improvements in retrieval
methodologies on real-world problems; and

4. To increase the availability of appropriate evaluation techniques for use


by industry and academia.

5
Information activities of TREC

• A)Main activity (core in TREC jargon)

• B)Subsidiary activities (tracks in TREC jargon)

• Ai)ad hoc (corresponding to retrospective retrieval)

• Aii)routing or filtering (corresponding to the selective dissemination of


information)

TREC TRACKS

A TREC workshop consists of a set tracks, areas of focus in which


particular retrieval tasks are defined. The tracks make TREC attractive to
a broader community by providing tasks that match the research
interests of more groups. Each track has a mailing list whose primary
purpose is to discuss the details of the track's tasks in the current TREC.

2017 TREC TRACKS

 Common Core Track

 Complex Answer Retrieval Track

 Dynamic Domain Track

 Live QA Track

 OpenSearch Track

 Precision Medicine Track

 Real-Time Summarization Track

 Tasks Track

 Clinical Decision Support Track

6
 Contextual Suggestion Track

 Common Core Track

 Complex Answer Retrieval Track

 Dynamic Domain Track

 Live QA Track

 OpenSearch Track

 Precision Medicine Track

 Real-Time Summarization Track

 Tasks Track

 Clinical Decision Support Track

 Contextual Suggestion Track

 Total Recall Track

PAST TRACKS

 Chemical Track

 Crowdsourcing Track

 Genomics Track

 Enterprise Track

 Cross-LanguageTrack

 FedWeb Track

 Federated Web Search Track

 Filtering Track

 HARD Track

7
 Interactive Track

 Knowledge Base Acceleration Track

 Legal Track

 Natural language processing track

 Novelty Track

 Question Answering Track

 Robust Retrieval Track

 Relevance Feedback Track

 Session Track

 Spam Track

 Temporal Summarization Track

 Terabyte Track

 Video Track

 Web Track

TREC COLLECTIONS

The document collection of TREC (also known as the TlPSTER collection)


reflects diversity of subject matter, word choice, literary styles, formats,
and so on. The primary TREC collections now contain about 2 gigabytes of
data with over 800,000 documents. The primary TREC document sets
consist mostly of newspaper or newswire articles, though there are some
government documents such as Federal Register, patent documents and
computer science abstracts.

8
TREC TOPICS

In TREC terminology an information need is termed as a topic. The data


structure that is actually submitted to a retrieval system is called a query. In
TREC a major objective is to provide topics that would allow a range of
query construction methods to be tested. A topic statement generally
consists of four sections:

1. An IDENTIFIER, e.g. <num> Number: R111

2. A TITLE, e.g. <title> Telemarketing practice in US.

3. A DESCRIPTION, e. g. <desc> Description: Find formats which reflect


telemarketing practices in the US which are Intrusive or deceptive and any
efforts to control or regulate against them.

4. A NARRATIVE, e.g. <narr> Narrative: Telemarketing practices found to


be abusive, intrusive, evasive, and deceptive. Fraudulent, or in any way
unwanted by persons contacted are relevant. Only such practices in the US
are relevant.

RELEVANCE JUDGEMENTS IN TREC

It is impossible to calculate the absolute recall for each query. TREC uses a
Specific method called pooling for calculating relative recall as opposed to
absolute recall. In this method of estimating recall, all the relevant
documents that occurred in the top l00 documents for each system and for
each query are combined together to produce a ‘pool’ of relevant. By
pooling all the results from all the participating teams, one can expect that
most of the relevant documents in the collection have been found. The ad
hoc tasks in TREC are evaluated using a package called trec_eval, which
reports about 85 different numbers for a run, including recall and precision
measures at various cut-off points and a single value summary measure
from recall and precision.

9
TREC 1

In November I992, TREC-l (the first Text Retrieval Conference) was held at
NIST The conference, co-sponsored by DARPA and NIST, brought together
information retrieval researchers to compare the results of their different
systems when used on a large new test collection (called the TIPSTER
collection). The first conference attracted 28 groups from academia and
industry, and generated widespread interest from the information retrieval
community.

Results of the TREC-1 experiments

Harman reports that the draft results of the TREC-1 experiments revealed the
following facts:

 Automatic construction of queries from natural language query


statements seems to work.

 Techniques based on natural language processing were no better and no


worse than those based on vector or probabilistic approaches; the best
of all approaches are all about equal.

TREC 2

Took place in August I993. In addition to 22 of the TREC-l groups, nine new
groups took part, bringing the total number of participating groups to 31.
The participants were able to choose from three levels of participation;
category A, full participation; category B, full participation using one-
quarter of the full document set; and category C, for evaluation only. Two
types of retrieval were examined: retrieval using an ‘ad hoc’ query, such as
a researcher might use in a library environment, and retrieval using a
‘routing’ query, such as a profile to filter some incoming document stream.
The number of documents to be returned was increased from 200 per topic
to 1000; and the total database size was increased from roughly l gigabyte
to 3 gigabytes.

10
TREC 3 TO TREC 5

TREC-3 introduced new topics with shorter descriptions, allowing for more
innovative topic expansion ideas. First two TRECs used very long topics
(averaging about 130 terms), in TREC-3 they were made shorter by
excluding some keywords, and in TREC-4 they were made even shorter to
investigate the problems with very short user statements (containing
around ten terms). TREC-5 included both short and long versions of the
topics with the goal of carrying out deeper investigations into which types
of techniques work well on various lengths of topics.

TREC 6

In TREC-6, three new tracks, speech, cross-language and high-precision


information retrieval, were introduced. The goal of the cross-language
information retrieval track is to facilitate research on systems that are
able to retrieve relevant documents regardless of the language of the
source documents. Speech, or the spoken document retrieval track, is to
stimulate research on retrieval techniques for spoken documents. The
high precision track was designed to deal with tasks in which the user of
a retrieval system is asked to retrieve ten documents that answer a
given information request within five minutes.

TREC 7

In addition to the main ad hoc task, TREC-7 contained seven tracks out of
which two tracks - query track and very large corpus track - were new. The
goal of the query track was to create a large query collection. The query
track was designed as a means of creating a large set of different queries
for an existing TREC topic set, topics 1 to 50.

11
TREC-8 to TREC-12

TREC-8 contained seven tracks out of which two - question-answering (QA)


and web tracks were new. The objective of QA track is to explore the
possibilities of providing answers to specific natural language queries. TREC-
9 also included seven tracks. A video track was introduced in TREC-10 and a
novelty track was introduced in TREC-l l. TREC12 (held in 2003) added three
new tracks: genome track, robust retrieval track, and HARD (Highly
Accurate Retrieval from Document).

Yearly Conference Cycle

12
TEXT RETRIEVAL CONFERENCE PUBLICATIONS

Presentations

Presentations of TREC 5 to TREC 9, TREC 2001 to TREC 2007, TREC 2013 to


TREC 2016. If required, paper copies of these documents are available through
the TREC Program Manager too.

Proceedings

NIST Special Publication 500-324: The Twenty-Sixth Text Retrieval Conference


Proceedings (TREC 2017) and so on.

TREC Experiment and Evaluation in Information Retrieval

Some important TREC publications include books like ‘TREC Experiment and
Evaluation in Information Retrieval’, edited by Ellen M. Voorhees and Donna K.
Harman.

CELEBRATING 25 YEARS OF TREC

On November 15, 2016, TREC has celebrated its completion of 25 years. In


this celebration ceremony, a list of events was included. They are as
follows:

 Webcast of Celebration

 Celebration Agenda

 Keynote talk by Sue Dumais

 Web track talk by David Hawking

 Bio/Medical tracks talk by Bill Hersh

 Legal track talk by Jason R. Baron

 Presentation from Charlie Clarke

 Presentation from Arjen de Vries

 Presentation from Diane Kelly

13
OTHER EVALUATIONS OF TREC

Conference and Labs of the Evaluation Form (CLEF)

Main mission is to promote research, innovation, and development of


information access systems with an emphasis on multilingual and
multimodal information with various levels of structure.

 Structured in two parts:

 A series of Evaluation Labs, i.e. laboratories to conduct evaluation of


information access systems and workshops to discuss and pilot
innovative evaluation activities;

 A peer-reviewed Conference on a broad range of issues.

NTCIR (NII Test Collection for IR Systems) Project

A series of evaluation workshops designed to enhance research in


Information Access (IA) technologies including information retrieval,
question answering, text summarization, extraction, etc. It was co-
sponsored by Japan Society for Promotion of Science (JSPS) as part of
JSPS "Research for Future" Program" and National Center for Science
Information Systems (NACSIS) since 1997.

FIRE (Forum for Information Retrieval Evaluation)

The 10th edition of the annual meeting of Forum for Information Retrieval
Evaluation (fire.irsi.res.in). Since its inception in 2008, FIRE has had a strong
focus on shared tasks, similar to those offered at Evaluation forums like
TREC, CLEF and NTCIR. The shared tasks focus on solving specific problems
in the area information access and more importantly help in generating
evaluation datasets for the research community.

Chinese Web test collection

Its four goals (referred to TREC 1)

1) To encourage retrieval research based on large test collections;

14
2) To increase communication among industry, academia, and government
by creating an open forum for the exchange of research ideas

3) To speed the transfer of technology from research labs into commercial


products by demonstrating substantial improvements in retrieval
methodologies on real-world problems; and

4) To increase the availability of appropriate evaluation techniques for use


by industry and academia, including development of new evaluation
techniques more applicable to current systems.

BENEFITS OF TREC EXPERIMENTS

A wide range of information retrieval strategies has been tested through


the TREC series of experiments, such as:

1. Boolean retrieval

2. Statistical and probabilistic indexing and term weighting strategies

3. Passage or paragraph retrieval

4. Combining the results of more than one search

5. Retrieval based on prior relevance assessments

6. Natural language-based and statistically-based phrase indexing

7. Query expansion and query reduction

8. String and concept-based searching

9. Dictionary-based stemming

10. Question-answering

11. Content-based multimedia retrieval and so on.

15
CONCLUSION

One can conclude that TREC has been a vehicle not only for improving retrieval
technology, but also for providing a better understanding of removal
evaluation. TREC series of experiments have brought together researchers
from across the world to work on common. Goal is build up large text
collection.

16
REFERENCES

 Chowdhury, G. G. (2004), Introduction to modern information retrieval,


London; Facet Publishing.

 https://en.wikipedia.org/wiki/Text_Retrieval_Conference

 https://trec.nist.gov/evals.html

17

You might also like