TREC Revised

INFORMATION RETRIEVAL
PROJECT REPORT
ON
TREC
SUBMITTED BY
SUBHAYAN CHATTERJEE
Roll No. : 001900803008
MLISc. (DIGITAL LIBRARY)
SESSION: 2019-2021
PAPER CODE: MLDL-05
JADAVPUR UNIVERSITY
DEPARTMENT OF LIBRARY & INFORMATION SCIENCE
KOLKATA 700 032
GUIDED BY
DR. UDAYAN BHATTACHARYYA
1
Acknowledgement
It is great pleasure for me to express my deep sense of gratitude to our respected

professor Prof. Udayan Bhattacharyya, Department of Library &
Information Science, Jadavpur University, for his guidance and important
suggestions during this project work and also thankful to our departmental
Librarian, other staffs, senior students and my friends.
Date: 10.04.2020
Place: Kolkata Subhayan Chatterjee
2
CONTENT
Topic Page No.
 Introduction 4
 Objectives 4
 TREC Facts 4
 Origin of TREC 5
 Goals for the TREC experiments 5
 Information activities of TREC 6
 TREC Tracks 6
 TREC Collections 8
 TREC Topics 9
 Relevance Judgements in TREC 9
 TREC 1 10
 TREC 2 10
 TREC 3-5, 6 11
 TREC 7 11
 TREC 8-12 12
 TREC Yearly Conference Cycle 12
 TREC Publications 13
 Celebrating 25 years of TREC 13
 Evaluation of TREC 14
 Benefits of TREC experiments 15
 Conclusion 16
 References 17
3
INTRODUCTION
The Text Retrieval Conference (TREC) is an ongoing series

of workshops focusing on a list of different information retrieval (IR)
research areas, or tracks. Its purpose was to support research within the
information retrieval community by providing the infrastructure
necessary for large-scale evaluation of text retrieval methodologies. The
Text Retrieval Conference (TREC), co-sponsored by the National Institute
of Standards and Technology (NIST) and U.S. Department of Defense,
was started in 1992 as part of the TIPSTER Text program. Participants
run their programs on the data. Judges evaluate the results. Participants
share their experiences in the conference.
OBJECTIVES
1. Encourages research in text retrieval in large data sets.
2. By creating a “practical” conference, increases communication between

academia, business, and government.
3. Provide availability of many different text retrieval techniques.
TREC FACTS
In 2003, included 93 groups from 22 countries
Makes test collections and submitted retrieval code available to public
First Large Scale:
 Non-English retrieval
 Retrieval of speech recordings
 Retrieval across multiple languages
4
ORIGIN OF TREC
Researchers in information retrieval have based their research on small test

collections like CACM, NPL, and INSPEC collections, each containing from
only a few hundred to few thousand documents. The major problem for the
researchers was to get a test collection large enough to match a real-life
situation with an infrastructure adequate for conducting test on them. In
1991, the US Defence Advanced Research Projects Agency (DARPA),
decided to fund the TREC experiments, to be run by the NIST, in order to
enable information retrieval researchers to scale up from small collections
of data to larger experiments.
Goals for the TREC experiments
1. To encourage research in information retrieval based on large test

collections;
2. To increase communication among industry, academia, and government

by creating an open forum for the exchange of research ideas;
3. To speed the transfer of technology from research labs into commercial

products by demonstrating substantial improvements in retrieval
methodologies on real-world problems; and
4. To increase the availability of appropriate evaluation techniques for use

by industry and academia.
5
Information activities of TREC
• A)Main activity (core in TREC jargon)
• B)Subsidiary activities (tracks in TREC jargon)
• Ai)ad hoc (corresponding to retrospective retrieval)
• Aii)routing or filtering (corresponding to the selective dissemination of

information)
TREC TRACKS
A TREC workshop consists of a set tracks, areas of focus in which

particular retrieval tasks are defined. The tracks make TREC attractive to
a broader community by providing tasks that match the research
interests of more groups. Each track has a mailing list whose primary
purpose is to discuss the details of the track's tasks in the current TREC.
2017 TREC TRACKS
 Common Core Track
 Complex Answer Retrieval Track
 Dynamic Domain Track
 Live QA Track
 OpenSearch Track
 Precision Medicine Track
 Real-Time Summarization Track
 Tasks Track
 Clinical Decision Support Track
6
 Contextual Suggestion Track
 Common Core Track
 Complex Answer Retrieval Track
 Dynamic Domain Track
 Live QA Track
 OpenSearch Track
 Precision Medicine Track
 Real-Time Summarization Track
 Tasks Track
 Clinical Decision Support Track
 Contextual Suggestion Track
 Total Recall Track
PAST TRACKS
 Chemical Track
 Crowdsourcing Track
 Genomics Track
 Enterprise Track
 Cross-LanguageTrack
 FedWeb Track
 Federated Web Search Track
 Filtering Track
 HARD Track
7
 Interactive Track
 Knowledge Base Acceleration Track
 Legal Track
 Natural language processing track
 Novelty Track
 Question Answering Track
 Robust Retrieval Track
 Relevance Feedback Track
 Session Track
 Spam Track
 Temporal Summarization Track
 Terabyte Track
 Video Track
 Web Track
TREC COLLECTIONS
The document collection of TREC (also known as the TlPSTER collection)

reflects diversity of subject matter, word choice, literary styles, formats,
and so on. The primary TREC collections now contain about 2 gigabytes of
data with over 800,000 documents. The primary TREC document sets
consist mostly of newspaper or newswire articles, though there are some
government documents such as Federal Register, patent documents and
computer science abstracts.
8
TREC TOPICS
In TREC terminology an information need is termed as a topic. The data

structure that is actually submitted to a retrieval system is called a query. In
TREC a major objective is to provide topics that would allow a range of
query construction methods to be tested. A topic statement generally
consists of four sections:
1. An IDENTIFIER, e.g. <num> Number: R111
2. A TITLE, e.g. <title> Telemarketing practice in US.
3. A DESCRIPTION, e. g. <desc> Description: Find formats which reflect

telemarketing practices in the US which are Intrusive or deceptive and any
efforts to control or regulate against them.
4. A NARRATIVE, e.g. <narr> Narrative: Telemarketing practices found to

be abusive, intrusive, evasive, and deceptive. Fraudulent, or in any way
unwanted by persons contacted are relevant. Only such practices in the US
are relevant.
RELEVANCE JUDGEMENTS IN TREC
It is impossible to calculate the absolute recall for each query. TREC uses a
Specific method called pooling for calculating relative recall as opposed to
absolute recall. In this method of estimating recall, all the relevant
documents that occurred in the top l00 documents for each system and for
each query are combined together to produce a ‘pool’ of relevant. By
pooling all the results from all the participating teams, one can expect that
most of the relevant documents in the collection have been found. The ad
hoc tasks in TREC are evaluated using a package called trec_eval, which
reports about 85 different numbers for a run, including recall and precision
measures at various cut-off points and a single value summary measure
from recall and precision.
9
TREC 1
In November I992, TREC-l (the first Text Retrieval Conference) was held at
NIST The conference, co-sponsored by DARPA and NIST, brought together
information retrieval researchers to compare the results of their different
systems when used on a large new test collection (called the TIPSTER
collection). The first conference attracted 28 groups from academia and
industry, and generated widespread interest from the information retrieval
community.
Results of the TREC-1 experiments
Harman reports that the draft results of the TREC-1 experiments revealed the
following facts:
 Automatic construction of queries from natural language query

statements seems to work.
 Techniques based on natural language processing were no better and no

worse than those based on vector or probabilistic approaches; the best
of all approaches are all about equal.
TREC 2
Took place in August I993. In addition to 22 of the TREC-l groups, nine new
groups took part, bringing the total number of participating groups to 31.
The participants were able to choose from three levels of participation;
category A, full participation; category B, full participation using one-
quarter of the full document set; and category C, for evaluation only. Two
types of retrieval were examined: retrieval using an ‘ad hoc’ query, such as
a researcher might use in a library environment, and retrieval using a
‘routing’ query, such as a profile to filter some incoming document stream.
The number of documents to be returned was increased from 200 per topic
to 1000; and the total database size was increased from roughly l gigabyte
to 3 gigabytes.
10
TREC 3 TO TREC 5
TREC-3 introduced new topics with shorter descriptions, allowing for more
innovative topic expansion ideas. First two TRECs used very long topics
(averaging about 130 terms), in TREC-3 they were made shorter by
excluding some keywords, and in TREC-4 they were made even shorter to
investigate the problems with very short user statements (containing
around ten terms). TREC-5 included both short and long versions of the
topics with the goal of carrying out deeper investigations into which types
of techniques work well on various lengths of topics.
TREC 6
In TREC-6, three new tracks, speech, cross-language and high-precision

information retrieval, were introduced. The goal of the cross-language
information retrieval track is to facilitate research on systems that are
able to retrieve relevant documents regardless of the language of the
source documents. Speech, or the spoken document retrieval track, is to
stimulate research on retrieval techniques for spoken documents. The
high precision track was designed to deal with tasks in which the user of
a retrieval system is asked to retrieve ten documents that answer a
given information request within five minutes.
TREC 7
In addition to the main ad hoc task, TREC-7 contained seven tracks out of
which two tracks - query track and very large corpus track - were new. The
goal of the query track was to create a large query collection. The query
track was designed as a means of creating a large set of different queries
for an existing TREC topic set, topics 1 to 50.
11
TREC-8 to TREC-12
TREC-8 contained seven tracks out of which two - question-answering (QA)

and web tracks were new. The objective of QA track is to explore the
possibilities of providing answers to specific natural language queries. TREC-
9 also included seven tracks. A video track was introduced in TREC-10 and a
novelty track was introduced in TREC-l l. TREC12 (held in 2003) added three
new tracks: genome track, robust retrieval track, and HARD (Highly
Accurate Retrieval from Document).
Yearly Conference Cycle
12
TEXT RETRIEVAL CONFERENCE PUBLICATIONS
Presentations
Presentations of TREC 5 to TREC 9, TREC 2001 to TREC 2007, TREC 2013 to

TREC 2016. If required, paper copies of these documents are available through
the TREC Program Manager too.
Proceedings
NIST Special Publication 500-324: The Twenty-Sixth Text Retrieval Conference

Proceedings (TREC 2017) and so on.
TREC Experiment and Evaluation in Information Retrieval
Some important TREC publications include books like ‘TREC Experiment and
Evaluation in Information Retrieval’, edited by Ellen M. Voorhees and Donna K.
Harman.
CELEBRATING 25 YEARS OF TREC
On November 15, 2016, TREC has celebrated its completion of 25 years. In

this celebration ceremony, a list of events was included. They are as
follows:
 Webcast of Celebration
 Celebration Agenda
 Keynote talk by Sue Dumais
 Web track talk by David Hawking
 Bio/Medical tracks talk by Bill Hersh
 Legal track talk by Jason R. Baron
 Presentation from Charlie Clarke
 Presentation from Arjen de Vries
 Presentation from Diane Kelly
13
OTHER EVALUATIONS OF TREC
Conference and Labs of the Evaluation Form (CLEF)
Main mission is to promote research, innovation, and development of

information access systems with an emphasis on multilingual and
multimodal information with various levels of structure.
 Structured in two parts:
 A series of Evaluation Labs, i.e. laboratories to conduct evaluation of

information access systems and workshops to discuss and pilot
innovative evaluation activities;
 A peer-reviewed Conference on a broad range of issues.
NTCIR (NII Test Collection for IR Systems) Project
A series of evaluation workshops designed to enhance research in

Information Access (IA) technologies including information retrieval,
question answering, text summarization, extraction, etc. It was co-
sponsored by Japan Society for Promotion of Science (JSPS) as part of
JSPS "Research for Future" Program" and National Center for Science
Information Systems (NACSIS) since 1997.
FIRE (Forum for Information Retrieval Evaluation)
The 10th edition of the annual meeting of Forum for Information Retrieval
Evaluation (fire.irsi.res.in). Since its inception in 2008, FIRE has had a strong
focus on shared tasks, similar to those offered at Evaluation forums like
TREC, CLEF and NTCIR. The shared tasks focus on solving specific problems
in the area information access and more importantly help in generating
evaluation datasets for the research community.
Chinese Web test collection
Its four goals (referred to TREC 1)
1) To encourage retrieval research based on large test collections;
14
2) To increase communication among industry, academia, and government
by creating an open forum for the exchange of research ideas
3) To speed the transfer of technology from research labs into commercial

products by demonstrating substantial improvements in retrieval
methodologies on real-world problems; and
4) To increase the availability of appropriate evaluation techniques for use

by industry and academia, including development of new evaluation
techniques more applicable to current systems.
BENEFITS OF TREC EXPERIMENTS
A wide range of information retrieval strategies has been tested through

the TREC series of experiments, such as:
1. Boolean retrieval
2. Statistical and probabilistic indexing and term weighting strategies
3. Passage or paragraph retrieval
4. Combining the results of more than one search
5. Retrieval based on prior relevance assessments
6. Natural language-based and statistically-based phrase indexing
7. Query expansion and query reduction
8. String and concept-based searching
9. Dictionary-based stemming
10. Question-answering
11. Content-based multimedia retrieval and so on.
15
CONCLUSION
One can conclude that TREC has been a vehicle not only for improving retrieval
technology, but also for providing a better understanding of removal
evaluation. TREC series of experiments have brought together researchers
from across the world to work on common. Goal is build up large text
collection.
16
REFERENCES
 Chowdhury, G. G. (2004), Introduction to modern information retrieval,

London; Facet Publishing.
 https://en.wikipedia.org/wiki/Text_Retrieval_Conference
 https://trec.nist.gov/evals.html
17

TREC Revised

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TREC Revised

Uploaded by

Copyright:

Available Formats

INFORMATION RETRIEVAL

Roll No. : 001900803008

MLISc. (DIGITAL LIBRARY)

PAPER CODE: MLDL-05

DEPARTMENT OF LIBRARY & INFORMATION SCIENCE

KOLKATA 700 032

DR. UDAYAN BHATTACHARYYA

It is great pleasure for me to express my deep sense of gratitude to our respected

Place: Kolkata Subhayan Chatterjee

Topic Page No.

The Text Retrieval Conference (TREC) is an ongoing series

1. Encourages research in text retrieval in large data sets.

2. By creating a “practical” conference, increases communication between

3. Provide availability of many different text retrieval techniques.

In 2003, included 93 groups from 22 countries

Makes test collections and submitted retrieval code available to public

First Large Scale:

 Retrieval of speech recordings

 Retrieval across multiple languages

Researchers in information retrieval have based their research on small test

Goals for the TREC experiments

1. To encourage research in information retrieval based on large test

2. To increase communication among industry, academia, and government

3. To speed the transfer of technology from research labs into commercial

4. To increase the availability of appropriate evaluation techniques for use

• A)Main activity (core in TREC jargon)

• B)Subsidiary activities (tracks in TREC jargon)

• Ai)ad hoc (corresponding to retrospective retrieval)

• Aii)routing or filtering (corresponding to the selective dissemination of

A TREC workshop consists of a set tracks, areas of focus in which

2017 TREC TRACKS

 Common Core Track

 Complex Answer Retrieval Track

 Dynamic Domain Track

 Precision Medicine Track

 Real-Time Summarization Track

 Clinical Decision Support Track

 Common Core Track

 Complex Answer Retrieval Track

 Dynamic Domain Track

 Precision Medicine Track

 Real-Time Summarization Track

 Clinical Decision Support Track

 Contextual Suggestion Track

 Total Recall Track

 Federated Web Search Track

 Knowledge Base Acceleration Track

 Natural language processing track

 Question Answering Track

 Robust Retrieval Track

 Relevance Feedback Track

 Temporal Summarization Track

The document collection of TREC (also known as the TlPSTER collection)

In TREC terminology an information need is termed as a topic. The data

1. An IDENTIFIER, e.g. <num> Number: R111

2. A TITLE, e.g. <title> Telemarketing practice in US.

3. A DESCRIPTION, e. g. <desc> Description: Find formats which reflect

4. A NARRATIVE, e.g. <narr> Narrative: Telemarketing practices found to

RELEVANCE JUDGEMENTS IN TREC

Results of the TREC-1 experiments

 Automatic construction of queries from natural language query

 Techniques based on natural language processing were no better and no

In TREC-6, three new tracks, speech, cross-language and high-precision

TREC-8 contained seven tracks out of which two - question-answering (QA)