You are on page 1of 41

UNIVERSITAT DE BARCELONA

Facultat de Biblioteconomia i Documentaci


Metodologia de la recerca Professor: ngel Borrego

Scientific Information Sources

Contents
From author to reader: the scholarly communication process Index & Abstract (I&A) databases Assessment of I&A databases Searching I&A databases

Scholarly communication chain


Authors (scientists) Journal Editors and Referees Journals I&A databases Librarians End users

Authors (researchers)
Scientists are the first link in the scholarly communication chain. They create new knowledge and describe it in articles, books, patents, etc.

Publishing an article

Source: Weller, 2000

Editors and publishers


Scientific editor: an expert in the book or journals field who manages manuscripts review. Referee or reviewer (usually two or three): experts in the field who (blindly) evaluate the work for the editor, noting weaknesses or problems along with suggestions for improvement, and including an explicit recommendation of what to do with the manuscript (accept or reject). Publisher: some journals are published by non profit scientific societies or universities; other journals are published by commercial publishers (Elsevier, Emerald, Springer, Wiley) that expect economic revenues, especially through library subscriptions.

Scholarly journals
A periodical publication reporting new research in the form of:
Articles: complete descriptions of current original research findings. Review articles: accumulate the results of many articles on a topic into a coherent narrative about the state of the art in that discipline. Letters (not to be confused with the letters to the editor) or short communications: short descriptions of important current research findings. In 2004, Carol Tenopir (Library Journal, 2/1/2004) estimated that there were about 43,500 active academic journals.

I&A databases
Databases are produced and/or hosted by public administrations or private companies. These organizations select the most important journals in a field and analyse them in order to create Index & Abstract (I&A) databases. These databases usually offer additional services such as setting users profiles, email alerts, etc. Hosts commercialize databases from several producers and provide users with engines to search them.

Where to search for scientific information?


Bibliographic (I&A) databases: produced and distributed by public administrations or private companies:
Instituto de Estudios Documentales sobre Ciencia y Tecnologa (cindoc.csic.es) National Library of Medicine (www.nlm.nih.gov) Dialog (dialog.com)

Journal gateways:
Elsevier ScienceDirect (sciencedirect.com) EmeraldInsight (emeraldinsight.com)

Internet search engines:


Google Scholar (scholar.google.com) Scirus (scirus.com)

Librarians
They are intermediaries between information and end users: Know the best information sources in any given field. Have the ability to transform a users information need into a search equation that can be addressed to an automatic system. Tasks: Exploit information sources. Create new information sources. Train users in the use of these sources.

Librarians
They are intermediaries between information and end users: Know the best information sources in any given field. Are to transform an information need into Have the abilityyou sure about this? a search equation that can be addressed to an automatic system. Tasks: Exploit information sources. Create new information sources. Train users in the use of these sources.

Where do scientists search for information?

Rowlands & Nicholas, 2005

Where???

Fry et al., 2009

Schonfeld i Housewright, 2009

End users
The main users of scientific information are scientists i.e. authors and some professionals doctors, for instance. Articles in scientific journals are written by scientists for scientists. There is also an education market i.e. handbooks and manuals that explain the basics of each discipline for educational purposes. Finally, there is also a market for popular science including books, journals, mass media, museums, etc.

In summary

Information is the main input and output of science

Contents
From author to reader: scholarly communication process I&A databases: concept Assessing I&A databases Searching I&A databases

Databases
A database is an organized collection of data, usually in digital form so that its contents can easily be accessed, managed, and updated....

but you already know what a database is!

Access to scientific information

First scientific journals

Print indexes

Access to databases through telephone lines

Databases on CD-ROM

Web online access

1665

1840

1960

1980

2000

2010

The database market


Year
1980 1985 1990 1994 1997 2007

Databases
411 2.247 3.943 5.307 10.000 20.000

Producers
269 1.316 1.950 2.220 3.400 n.d.

Hosts
71 414 645 812 1.800 n.d.

Large et al., p. 46

Gale Directory of Databases


Volume 1: online databases Profiles nearly 11,000 online databases made publicly available from the producer or an online service Volume 2: CD-ROM, DVD, etc. Profiles more than 8,000 database products offered in portable from or through batch processing In its 34th edition (2011), Gale Directory of Databases contains contact and descriptive information on nearly 19,000 databases and over 3,300 producers, online services, and vendors/distributors of database products.

Gale Directory of Databases (2)


Product descriptions. Database producers: contact information for database producers and a list of products they produce. Vendors and distributors: contact information for vendors and distributors, conditions of use, and a list of products they offer. Geographic index: list producers and vendors/distributors by country. Subject index: classifies products within 1,800 subject terms. Master index: lists all names in a single alphabetic sequence.

2011 edition

Gale Directory of Databases (3)

Contents
From author to reader: scholarly communication process I&A databases: concept Assessing I&A databases Searching I&A databases

Assessment criteria
Contents: Coverage, accuracy, consistency, updating Information retrieval: Interface and search options Management: Price, hardware and software requirements, authentication, information provided by the producer, integration with other library products, support, etc.

Database contents
Coverage: Topics, source types, chronological, geographical, languages Local availability of the indexed sources Accuracy: Grammar and typing mistakes. Duplicate records. Consistency: Formal description: names of authors and journals Subject description: indexing and classification Updating: Growth in the number of records Delay in the introduction of records since publication

Interface and search options


Search page: Database structure and searchable fields Simple / advanced / command search Operators (Boolean, proximity, wildcards, etc.) Field indexes and thesaurus Search in a specific database, search history, multilingual interface, etc. Results page: Visualisation: format and number of records Ranking criteria Select and manage records Record clustering Similar records Refine search Information on errors (0 results). Record visualisation: Record formats Navigation between records and linked fields Highlight of search terms in records Additional pages: database description, structure, help, etc.

Database management
Price and payment options Hardware and software requirements Authentication (password / IP / federated authentication) Users manuals, online help, languages, etc. Integration with other library products (metasearch engines, reference management software, other databases from the same host). Library support Access (CD / online) ***************************************************** And listen to your users: log analysis, surveys, observation!!

The JISC Academic Database Assessment Tool (ADAT) aims to help libraries to make informed decisions about future subscriptions to bibliographic databases.
http://www.jisc-adat.com

Precision and recall


Relevant Retrieved Non-retrieved Total a c (silence) a+c Non-relevant b (noise) d b+d Total a+b c+d a+b+c+d

Relevant documents retrieved (a) Precision = X 100 Retrieved documents (a + b)

Relevant documents retrieved (a) Recall = X 100 Relevant documents in the database (a + c)

Drawbacks of precision and recall


What is a relevant document? We assume that relevance is binary. Different users may require of different levels of precision and recall. There is an inverse relationship between precision and recall. Recall is just an estimate. If the system ranks documents by relevance, then precision and recall vary as the user examines the retrieved records.

Example
Relevant documents in the database for query Q1:
D3, D5, D9, D25, D39, D44, D56, D71, D89, D123

Retrieved documents for query Q1 ranked by relevance (relevant documents are dotted): 1. 2. 3. 4. 5. D123 D84 D56 D6 D8 6. D9 7. D511 8. D129 9. D187 10. D25 11. 12. 13. 14. 15. D38 D48 D250 D113 D3

Precision at different levels of recall


100 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 Recall 60 70 80 90 100

Precision

Contents
From author to reader: scholarly communication process I&A databases: concept Assessing I&A databases Searching I&A databases

Query process
System Contents User Need

Representation

Representation

Organization Match

Search

Retrieved records

Search options
Truncation and wildcards
+ Recall

Natural vs. controlled vocabulary

Boolean operators

Proximity operators

Search limits: date, type of source, language, etc.

+ Precision

Drawbacks of the Boolean model


It does not matter whether there is an occurrence of the search term in the document or a hundred. It does not matter whether a document complies with all the requirements of an or search. Partial coincidence (for instance, complying with almost all the and conditions) is not taken into account. It is not possible to reflect the importance of each search term. A Boolean search just divides the database in two sets of relevant and non-relevant documents depending on whether they fulfil the search conditions or do not. All retrieved documents are supposed to be of similar relevance so there is no mechanism to rank documents.

Relevance sorting
A simple method consists in assigning a weight to each term in each document. The easiest way to assign a weight to a term is to count its frequency in the document. The total weight of a document in reply to a query is the sum of weights of all search terms. Those documents with a higher weight are ranked first.

Relevance sorting: example


Term A Term B Term C Term D

Document 1

Document 2

Document 3

Relevance sorting: example


Retrieved documents sorted by relevance for each query: A AND C: Doc. 2; Doc. 3 A OR C: Doc. 2; Doc. 1; Doc. 3 A NOT C: Doc. 1 Improving relevance sorting: Weight the frequency of each term in the database: less frequent terms are more useful to discriminate documents. Position of the search term (title, for instance). Number of incoming links from other documents (in digital environments).

Pay attention to the presentation of results


Good presentation increases the potential use of the information by the users, improves their comprehension of the information, helps them to save time, and increases users satisfaction.
Specify the sources searched and the search strategy. Summarise the results. Organise the references (alphabetically, by relevance...) and present them in a standard format. Pay attention to the format (headers, fonts, margins, etc.). Include recommendations: full text access, relevant sources, etc.

Reading
Stone, G. 2009. Resource Discovery. In: Digital Information: Order or anarchy? London: Facet, p. 133-164. Also available at: http://eprints.hud.ac.uk/5882/

You might also like