You are on page 1of 37

TBUI

LESSON 1: DATA, INFORMATION & KNOWLEDGE

The Differences Between Data, Information and Knowledge


(Source: http://www.infogineering.net/data-information-knowledge.htm)
We frequently hear the words Data, Information and Knowledge used as if they are the same thing.
You hear people talking about the Internet as a “vast network of human knowledge” or that they’ll “e-mail
through the data.”
By defining what we mean by data, information and knowledge – and how they interact with one another
– it should be much easier.

Has Anyone Seen My CDs?


A few years ago, the UK Government Tax office lost some CDs containing 25 million people’s records, when
they were posted unsecurely. The fear was that there was enough information contained on them to allow
criminals to set up bank accounts, get loans, and do their Christmas shopping… all under someone else’s
name.
In the fallout, the main argument in the press was about security, and inevitably there were many that
were using it to attack Government ministers. Anyone who’s ever worked in a bureaucracy will know that
this kind of thing goes on more often that we would like to think, as people cut corners. No procedure or
official process is water-tight. It’s just this time, they didn’t get away with it.
The media used the terms “data” and “information” interchangeably.
For example, one of the frequent mistakes was that they lost “data.” However, you can’t physically lose
data. You can’t physically pick up data, move it about, etc.
Confused?
Let me explain, but – before we go any further - I should point out that I’m using the Infogineering defintions
of the three words (data, information, knowledge) here. They’ve been so muddled up over the past few years
that the various definitions don’t match up. So, let me explain how Infogineering views them all.
Knowledge
Firstly, let’s look at Knowledge. Knowledge is what we know. Think of this as the map of the World we
build inside our brains. Like a physical map, it helps us know where things are – but it contains more than
that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain
links all these things together into a giant network of ideas, memories, predictions, beliefs, etc.
It is from this “map” that we base our decisions, not the real world itself. Our brains constantly update this
map from the signals coming through our eyes, ears, nose, mouth and skin.
You can’t currently store knowledge in anything other than a brain, because a brain connects it all
together. Everything is inter-connected in the brain. Computers are not artificial brains. They don’t
understand what they are processing, and can’t make independent decisions based upon what you tell
them.
There are two sources that the brain uses to build this knowledge - information and data.

Data
Data is/are the facts of the World. For example, take yourself. You may be 5ft tall, have brown hair and blue
eyes. All of this is “data”. You have brown hair whether this is written down somewhere or not.
In many ways, data can be thought of as a description of the World. We can perceive this data with our
senses, and then the brain can process this.
Human beings have used data as long as we’ve existed to form knowledge of the world.
Until we started using information, all we could use was data directly. If you wanted to know how tall I was,
you would have to come and look at me. Our knowledge was limited by our direct experiences.
Information
Information allows us to expand our knowledge beyond the range of our senses. We can capture data in
information, then move it about so that other people can access it at different times.
Here is a simple analogy for you.
If I take a picture of you, the photograph is information. But what you look like is data.
I can move the photo of you around, send it to other people via e-mail etc. However, I’m not actually moving
you around – or what you look like. I’m simply allowing other people who can’t directly see you from where
they are to know what you look like. If I lose or destroy the photo, this doesn’t change how you look.
So, in the case of the lost tax records, the CDs were information. The information was lost, but the data
wasn’t. Mrs Jones still lives at 14 Whitewater road, and she was still born on 15th August 1971.
The Infogineering Model (below) explains how these interact…

Why does it matter that people mix them up?


When people confuse data with information, they can make critical mistakes. Data is always correct (I can’t
be 29 years old and 62 years old at the same time) but information can be wrong (there could be two files
on me, one saying I was born in 1981, and one saying I was born in 1948).
Information captures data at a single point. The data changes over time. The mistake people make is
thinking that the information they are looking at is always an accurate reflection of the data.
By understanding the differences between these, you can better understand how to make better decisions
based on the accurate facts.
In Brief
Data: Facts, a description of the World
Information: Captured Data and Knowledge
Knowledge: Our personal map/model of the World

What is information literacy?


How does information literacy relate to digital literacy, media literacy, information fluency, academic
research skills?
There are many definitions of Information Literacy, for example:
 American Library Association, 1989: “To be information literate, a person must be able to recognize
when information is needed and have the ability to locate, evaluate, and use effectively the needed
information.”
 CILIP have defined information literacy as “Information literacy is knowing when and why you need
information, where to find it, and how to evaluate, use and communicate it in an ethical manner.”
 The Society of College, National and University Libraries (SCONUL) developed the Seven Pillars of
Information Literacy model in 1999. It was designed to be a practical working model that would help
develop ideas amongst practitioners and generate discussion. It was updated in 2004.
 The Joint Information Services Committee (JISC) uses the term i-skills to describe information literacy
and IT skills. i-Skills are defined as: “the ability to identify, assess, retrieve, evaluate, adapt, organise
and communicate information within an iterative context of review and reflection.
Information literate people...
 Recognize a need for information.
 Determine the extent of information needed.
 Access information efficiently.
 Critically evaluate information and its sources.
 Classify, store, manipulate and redraft information collected or generated.
 Incorporate selected information into their knowledge base.
 Use information effectively to learn, create new knowledge, solve problems & make decisions.
 Understand economic, legal, social, political & cultural issues in the use of information.
 Access & use information ethically & legally.
 Use information & knowledge for participative citizenship & social responsibility.
 Experience information literacy as part of independent learning & lifelong learning (LLL)

Why Big6™?

We all suffer from information overload. There’s just too much “stuff” out there, and it’s not easy to keep up.
At the same time, there’s an irony—yes, we are surrounded by information, but we can never seem to find
what we want, when we want it, and in a form we want it so that we can use it effectively.
One solution to the information problem—the one that seems to be most often adopted in schools (as well as
in business and society in general)—is to speed things up. We try to pack in more and more content, to work
faster to get more done. But, this is a losing proposition. Speeding things up can only work for so long.
Instead, we need to think about helping students to work smarter, not faster. There is an alternative to
speeding things up. It’s the smarter solution—one that helps students develop the skills and understandings
they need to find, process, and use information effectively. This smarter solution focuses on process as well
as content. Some people call this smarter solution information literacy or information skills instruction. We
call it the Big6.

The Big6™ Skills


The Big6 is a process model of how people of all ages solve an information problem. From practice and
study, we found that successful information problem-solving encompasses six stages with two sub-stages
under each:
1. Task Definition
1.1 Define the information problem
1.2 Identify information needed
2. Information Seeking Strategies
2.1 Determine all possible sources
2.2 Select the best sources
3. Location and Access
3.1 Locate sources (intellectually and physically)
3.2 Find information within sources
4. Use of Information
4.1 Engage (e.g., read, hear, view, touch)
4.2 Extract relevant information
5. Synthesis
5.1 Organize from multiple sources
5.2 Present the information
6. Evaluation
6.1 Judge the product (effectiveness)
6.2 Judge the process (efficiency)
People go through these Big6 stages—consciously or not—when they seek or apply information to solve a
problem or make a decision. It’s not necessary to complete these stages in a linear order, and a given stage
doesn’t have to take a lot of time. We have found that in almost all successful problem-solving situations, all
stages are addressed.

In addition to considering the Big6 as a process, another useful way to view the Big6 is as a set of basic,
essential life skills. These skills can be applied across situations—to school, personal, and work settings. The
Big6 Skills are applicable to all subject areas across the full range of grade levels. Students use the Big6 Skills
whenever they need information to solve a problem, make a decision, or complete a task.

The Big6™
Developed by Mike Eisenberg and Bob Berkowitz, the Big6 is the most widely known and widely used
approach to teaching information and technology skills in the world. Used in thousands of K-12 schools,
higher education institutions, and corporate and adult training programs, the Big6 information problem-
solving model is applicable whenever people need and use information. The Big6 integrates information
search and use skills along with technology tools in a systematic process to find, use, apply, and evaluate
information for specific needs and tasks.

The Seven Pillars of Information Literacy


The model is conceived as a three dimensional circular “building”, founded on an information landscape
which comprises the information world as it is perceived by an individual at that point in time. The picture is
also coloured by an individual’s personal information literacy landscape, in other words, their aptitude,
background and experiences, which will affect how they respond to any information literacy development.

The circular nature of the model demonstrates that becoming information literate is not a linear process; a
person can be developing within several pillars simultaneously and independently, although in practice they
are often closely linked.

Each pillar is further described by a series of statements relating to a set of skills/competencies and a set of
attitudes/understandings. It is expected that as a person becomes more information literate they will
demonstrate more of the attributes in each pillar and so move towards the top of the pillar.
LESSON 1: TYPES OF INFORMATION AND INFORMATION NEEDS

Information needs
Information seeking theories often refer to the concept of information needs, a presumed cognitive state
wherein an individual’s need state triggers the search behavior characteristic of information seeking in a
given context. While terms such as these have migrated from a common theory to everyday colloquial use,
their use in design research should be questioned and evaluated as in any research. There are other lenses to
view behavior that focus on motive, goals, activity contexts, but not necessarily “need,” whether information
or other personal need.
Information need goes back to a definition from Taylor’s (1962!) article “The Process of Asking Questions”
which describes four types:
 The actual, but unexpressed, need for information (the visceral need)
 The conscious, within-brain description of the need (the conscious need)
 The formal statement of the question (the formalized need)
 The question as presented to the information system (the compromised need).

Types of information needs

Determine your information needs


The type of information you need will depend upon the specific problem or assignment you've been given.
When determining your information requirements, look at the assignment carefully and consider the
following:
1. What Type of Assignment Is It?
Assignments can vary from a short 5 minute oral presentation, to a senior project or master's thesis, with
many other possibilities in between such as critiques, summaries, short essays, or term papers.
2. How much information do you need?
Some assignments can be completed by consulting brief summaries or overviews. while other assignments
require more detailed and comprehensive information.
3. Is currency an issue?
Some assignments require that you use the most current information, while others require historical
information or information over a period of time.
4. Do you need information from a particular type of publication?
For some assignments you may need information from scholarly peer-reviewed or professional journals,
while others may require information from trade journals, government publications, popular magazines, or
even tabloids.
5. Do you need to use primary sources?
In most cases you will use secondary sources such as books and articles. However, sometimes your
assignment may require you to use primary sources such as diaries, interviews, letters or raw data.
6. Do you need information in a particular format?
In addition to using print materials, your assignment may require you to use other kinds of sources such as
visual/graphic sources (art prints, slides, maps) numeric sources (statistics), audio sources (audio tape), or
electronic sources (listservers, computer files, the Web).
7. Is point of view an issue?
For assignments such as debates or argumentative essays, you may need to find information that presents a
particular point of view, opposing points of view, or a range of viewpoints.

Definition of Research
2.1 Definition of Research
re·search: NOUN: 1. a detailed study of a subject, especially in order to discover (new) information or reach a
(new) understanding.

Cambridge Dictionaries Online,


© Cambridge University Press 2003.
The word "research" is used to describe a number of similar and often overlapping activities involving a
search for information. For example, each of the following activities involves such a search; but the
differences are significant and worth examining.
Research type Essential characteristics

1. Find the population of each country in A search for individual facts or data. May be part
Africa or the total (in dollars) of Japanese of the search for a solution to a larger problem or
investment in the U.S. in 2002. simply the answer to a friendly, or not so friendly,
bar bet! Concerned with facts rather
than knowledge or analysis and answers can
normally be found in a single source.

2. Find out what is known generally about a A report or review, not designed to create new
fairly specific topic. "What is the history of information or insight but to collate and
the Internet?" synthesize existing information. A summary of the
past. Answers can typically be found in a selection
of books, articles, and Web sites.
[Note: gathering this information may often
include activities like #1 above.]

3. Gather evidence to determine whether Gathering and analyzing a body of information or


gang violence is directly related to playing data and extracting new meaning from it
violent video games. or developing unique solutions to problems or
cases. This is "real" research and requires an
open-ended question for which there is no ready
answer.
[Note: this will always include #2 above and
usually #1. It may also involve gathering new data
through experiments, surveys, or other
techniques.]

Information needs
Using a Topic to Generate Questions
Research requires a question for which no ready answer is available. What do you want to know about a
topic? Asking a topic as a question (or series of related questions) has several advantages:
1. Questions require answers.
A topic is hard to cover completely because it typically encompasses too many related issues;
but a question has an answer, even if it is ambiguous or controversial.

TOPIC QUESTION

Drugs and Crime Could liberalization of drug laws reduce crime in the U.S.?
1. A clear open-ended question calls for real research and thinking.
Asking a question with no direct answer makes research and writing more meaningful. Assuming
that your research may solve significant problems or expand the knowledge base of a discipline
involves you in more meaningful activity of community and scholarship.
Developing a Question
Developing a question from a broad topic can be done in many ways. Two such effective ways are
brainstorming and concept mapping.

brain·storm·ing noun: 1. A method of shared problem solving in which all members of a group
spontaneously contribute ideas. 2. A similar process undertaken by a person to solve a problem by rapidly
generating a variety of possible solutions.

The American Heritage® Dictionary of the English Language: Fourth Edition. 2000

Brainstorming is a free-association technique of spontaneously listing all words, concepts, ideas, questions,
and knowledge about a topic. After making a lengthy list, sort the ideas into categories. This allows you to
inventory your current awareness of a topic, decide what perspectives are most interesting and/or relevant,
and decide in which direction to steer your research.

con·cept map·ping noun phrase: 1. A process, focused on a topic, in which group or individual brainstorming
produces a visual graphic that represents how the creator(s) thinks about a subject, topic, etc. It illustrates
how knowledge is organized for the group or individual.

You may create a concept map as a means of brainstorming; or, following your brainstorm, you may take the
content you have generated and create your map from it . Concept maps may be elaborate or simple and are
designed to help you organize your thinking about a topic, recognize where you have gaps in your
knowledge, and help to generate specific questions that may guide your research.
Combining brainstorming and concept mapping (brainmapping, if you will) can be a productive way to begin
your thinking about a topic area. Try to establish as your goal the drafting of a topic definition statement
which outlines the area you will be researching and about which you will present your findings.

Broadening / Narrowing your topic


Broadening Your Research Question
A question that is too narrow or specific may not retrieve enough information. If this happens, broaden the
question. Most questions have multiple contexts and varying levels of specificity.
The underlined terms below represent broader ways of asking without changing the basic meaning. If you
find sources that treat a subject broadly, use the index or table of contents to locate useful sections or
chapters. Or ask yourself, "How might the arguments made here support my argument?"

Narrowing the Topic


A question that is too broad may retrieve too much information. Here are some strategies for narrowing the
scope of a question. They may be used individually or in combinations.
Some types of information sources
Information can come from virtually anywhere — media, blogs, personal experiences, books, journal and
magazine articles, expert opinions, encyclopaedias, and web pages — and the type of information you need
will change depending on the question you are trying to answer. Look at the following sources of
information. Notice the similarities between them.

Reference/background information sources


Background information resources give general information about a variety of topics. These are often
considered to be general reference sources, meaning that they provide basic facts and knowledge that can be
used as a foundation for one's research. A little time spent in background information resources can save a
tremendous amount of time when searching in databases and more subject-specific resources.
Almanacs
Almanacs are publications containing useful facts and statistical information; usually published annually.
Some almanacs are general, like the while others are subject-specific, such as Astronomical Almanac Online .
Search Addison for Almanacs to see a listing.
 Infoplease : http://www.infoplease.com/ (this resource gives access to different resources:
dictionaries, atlas, etc.) http://www.infoplease.com/almanacs.html
 Old farmers’ Almanac : http://www.almanac.com
Bibliographies
Bibliographies are lists of books, articles, and other materials about a particular subject or by a particular
author. Entries in this list usually follow a specified format such as the APA or CBE style guides and are
sometimes accompanied by an annotation. A bibliography is generally found at the end of a book or article,
but may comprise the entire article or book in and of itself. Search Addison by subject for your topic and
include the term bibliography to find examples. You can search entries from bibliographies in Summon to see
if we have access to the source.
Biographical resources
Biographical resources include encyclopedic entries, articles, books, and videos about a person, group, or
organization. They provide historical information about a person, lists of authored works, relationships to
other people and groups. and analysis of impact on a field. Search Addison by subject for your topic and
include the term biography to find sources. Many subject-specific databases provide biographies; check their
advanced search screen for limiting options.
 Biography.com : http://www.biography.com/
 Biographical Dictionary : http://www.s9.com/biography/
 BuscaBiografias ( Spanish ) : http://www.buscabiografias.com
Dictionaries
Dictionaries can be both lists of words and definition and also alphabetical lists of entries on a topic. Similar
to encyclopedias, these subject-specific dictionaries provide overview articles in a field, though not
necessarily in as much depth, or with a bibliographic list of references. Search Addison by subject for your
topic with the term dictionaries for sources.
 YourDictionary : http://www.yourdictionary.com
 Onelook Dictionaries: http://www.onelook.com
 Diccionarios.com: http://www.diccionarios.com
 Diccionario de la Real Academia de la Lengua Española : http://buscon.rae.es/draeI/
 Diccionario Panhispánico de Dudas: http://buscon.rae.es/dpdI/
 Vademecum Fundeu- Uso del español actual: http://www.fundeu.es/esurgente/lenguaes/
Multilingual/ interlinguistic dictionarie:
 Logos : http://www.logos.it/query/query.html
 IATE (Interactive Terminology for Europe): http://iate.europa.eu/iatediff
 Linguee (Dictionary and Translation Search Engine): http://www.linguee.com/
 IDP (Internet Dictionary Project) : http://www.ilovelanguages.com/IDP You can also download a
version: http://www.ilovelanguages.com/IDP/IDPC.html
 Wordreference : http://www.wordreference.com
Particular lexicographic resource:
 Acronym Finder : http://www.acronymfinder.com
 SilMaril : Acronym and abbreviation server : http://acronyms.silmaril.ie
 Thematic dictionary (glossary/lexicon) Glossarist http://www.glossarist.com
Directories
Directories are lists of persons or organizations that are systematically arranged. They typically provide
addresses and affiliations for individuals and addresses, officers, functions, and similar data for
organizations. Use these to compare organizations or to locate contact information to ask for information
directly from the source.
Encyclopedias
Encyclopedias provide short entries or essays on topics and typically include a short bibliography of
references for further research. Examples include both general purpose encyclopedias like World Book
Advanced Encyclopedia and subject-specific ones.
 Wikipedia http://www.wikipedia.org
 Britannica : http://www.britannica.com
 Encyclopedia.com (metasearch) : http://www.encyclopedia.com
Handbooks
Handbooks provide short entries or chapters on a topic, offering practical guidance or "how-to" instructions.
Examples include the CRC Handbook of Chemistry and Physics , ADA Nutrition Care Manual.
Statistical sources
Statistics can be used to verify your position or support an assertion in your research. Almanacs may offer
some statistical information, but statistical sources will provide more in-depth coverage. Examples include
the Statistical abstract of the United States and International Monetary Fund eLibrary .
 Statistics Agencies and Offices: http://www.census.gov/aboutus/stat_int.html
 Census Agency in the States , with homologous information Worldwide http://www.census.gov
 Eurostat : http://epp.eurostat.ec.europa.eu
 INE : http://www.ine.es/
 Infoplease : http://www.infoplease.com/ipa/A0004372.html
Thesauruses
Thesauruses provide lists of terms and synonyms. Examples can be both basic English language thesauruses,
like Roget's Thesaurus, that provide synonyms for common English words, and subject-specific thesauruses,
that provide official lists of terms (or controlled vocabulary) used in a field, such as the Thesaurus of
psychological index terms. Many databases provide thesaurus lookup capabilities for searching their subject
or descriptor index-searches. Use these to determine the correct/official term used to describe a topic in that
database or field.
Virtual reference desk/shelf
 MIT: http://libraries.mit.edu/help/virtualref
 University of Queesland : http://libraries.mit.edu/help/virtualref
 Library of Congress: http://www.loc.gov/rr/askalib/virtualref.html
 Refdesk (quick reference): http://www.refdesk.com
 Virtualref (Official/ gov US information): http://www.virtualref.com
 Internet Public Library (IPL) Reference
Collection: http://www.ipl.org/div/subject/browse/ref00.00.00

Information Sources
Primary, secondary, and tertiary sources
When searching for information on a topic, it is important to understand the value of primary, secondary,
and tertiary sources.
Primary sources allow researchers to get as close as possible to original ideas, events, and empirical
research as possible. Such sources may include creative works, first hand or contemporary accounts of
events, and the publication of the results of empirical observations or research.
Secondary sources analyze, review, or summarize information in primary resources or other secondary
resources. Even sources presenting facts or descriptions about events are secondary unless they are based
on direct participation or observation. Moreover, secondary sources often rely on other secondary sources
and standard disciplinary methods to reach results, and they provide the principle sources of analysis about
primary sources.
Tertiary sources provide overviews of topics by synthesizing information gathered from other resources.
Tertiary resources often provide data in a convenient form or provide information with context by which to
interpret it.
The distinctions between primary, secondary, and tertiary sources can be ambiguous. An individual
document may be a primary source in one context and a secondary source in another. Encyclopedias are
typically considered tertiary sources, but a study of how encyclopedias have changed on the Internet would
use them as primary sources. Time is a defining element.

While these definitions are clear, the lines begin to blur in the different discipline areas.
 In the humanities and social sciences
 In the sciences
Humanities and Social Sciences 1-2-3 sources
In the humanities and social sciences, primary sources are the direct evidence or first-hand accounts of
events without secondary analysis or interpretation. A primary source is a work that was created or written
contemporary with the period or subject being studied. Secondary sources analyze or interpret historical
events or creative works.
Primary sources
 Diaries
 Interviews
 Letters
 Original works of art
 Photographs
 Speeches
 Works of literature
A primary source is an original document containing firsthand information about a topic. Different fields of
study may use different types of primary sources.
Secondary sources
 Biographies
 Dissertations
 Indexes, abstracts, bibliographies (used to locate a secondary source)
 Journal articles
 Monographs
A secondary source contains commentary on or discussion about a primary source. The most important
feature of secondary sources is that they offer an interpretation of information gathered from primary
sources.
Tertiary sources
 Dictionaries
 Encyclopedias
 Handbooks
A tertiary source presents summaries or condensed versions of materials, usually with references back to
the primary and/or secondary sources. They can be a good place to look up facts or get a general overview of
a subject, but they rarely contain original material.
Examples
Subject Primary Secondary Tertiary
Art Painting Critical review of the painting Encyclopedia article on the artist
History Civil War diary Book on a Civil War Battle List of battle sites
Literature Novel or poem Essay about themes in the work Biography of the author
Political science Geneva Convention Article about prisoners of war Chronology of treaties

Sciences 1-2-3 sources


In the sciences, primary sources are documents that provide full description of the original research. For
example, a primary source would be a journal article where scientists describe their research on the genetics
of tobacco plants. A secondary source would be an article commenting or analyzing the scientists' research
on tobacco.
Primary sources
 Conference proceedings
 Interviews
 Journals
 Lab notebooks
 Patents
 Preprints
 Technical reports
 Theses and dissertations
These are where the results of original research are usually first published in the sciences. This makes them
the best source of information on cutting edge topics. However the new ideas presented may not be fully
refined or validated yet.
Secondary sources
 Monographs
 Reviews
 Textbooks
 Treatises
These tend to summarize the existing state of knowledge in a field at the time of publication. Secondary
sources are good to find comparisons of different ideas and theories and to see how they may have changed
over time.
Tertiary sources
 Compilations
 Dictionaries
 Encyclopedias
 Handbooks
 Tables
These types of sources present condensed material, generally with references back to the primary and/or
secondary literature. They can be a good place to look up data or to get an overview of a subject, but they
rarely contain original material.
Examples
Subjects Primary Secondary Tertiary
Agriculture Conference paper on tobacco Review article on the current state of Encyclopedia article on
genetics tobacco research tobacco
Chemistry Chemical patent Book on chemical reactions Table of related reactions
Physics Einstein's diary Biography on Einstein Dictionary of relativity

Magazines, trade journals, and scholarly journals


Periodicals are usually separated into several major groups: popular, trade, and scholarly. If you are able to
recognize the differences between these sources, you can focus your research to retrieve only the type of
information you need.
Popular magazines like People, Sports Illustrated, and Rolling Stone can be good sources for articles on
recent events or pop-culture topics, while Harpers, Scientific American, and The New Republic will offer more
in-depth articles on a wider range of subjects. These articles are geared towards readers who, although not
experts, are knowledgeable about the issues presented.
Trade journals are geared towards professionals in a discipline. They report news and trends in a field, but
not original research. They may provide product or service reviews, job listings, and advertisements.
Scholarly journals provider articles of interest to experts or researchers in a discipline. An editorial board
of respected scholars (peers) reviews all articles submitted to a journal. They decide if the article provides a
noteworthy contribution to the field and should be published. There are typically little or no advertisements.
Articles published in scholarly will include a list of references.
Peer review is a widely accepted indicator of quality scholarship in a discipline or field. Peer-reviewed (or
refereed) journals are scholarly journals that only publish articles that have passed through this review
process.

Formats
Data, facts, information, intelligence, and knowledge can be organized, presented and retrieved in many
physical formats:

Format Description

Printed Materials referenced and collected from print resources (hardback and
paperback books, periodicals, print-on-demand (POD) documents,
manuscripts, correspondence, loose leaf materials, notes, brochures,
etc.)

Digital Digital materials are information materials that are stored in an


electronic format on a hard drive, CD-ROM, or remote server.
Examples of digital materials are: e-books, e-journals, e-course
materials, e-databases, Web sites, e-print archives, or e-classes. These
materials are accessed with a computer via the Internet. While not all
materials listed in the library’s catalog are digital, many are, and the
OPAC (Online Public Access Catalog) provides the access to those
materials.

Audio/Video Materials collected using video (television, video recordings), audio


(radio, audio recordings) tools presented in recorded tapes, CDs, audio-
cassettes, reel to reel tapes, record albums, DVDs, videocassettes, audio
books, etc.

Multimedia Materials created by the use of several different media to convey


information (text, audio, graphics, animation, video, and interactivity).
Multimedia also refers to computer media. A PowerPoint presentation
using slides, video, and interactive links is an example of a multimedia
format.

Microform Microform: materials that have been photographed and their images
developed in reduced size onto 35mm or 16mm film rolls or 4”x 6”
fiche cards, which are viewed on machines equipped with magnifying
lenses. In the UI Library this includes back issues of state, national, and
international
newspapers
; non-current issues of magazines; older ERIC documents; and
Agricultural Experiment Station documents.

Human Information collected from face–to-face or telephone communication


and conversation or other personal communication (such as letters and
e-mails).
Information timeline
When searching for information, it is always wise to keep in mind the information timeline. It is very easy to
find plenty of books and encyclopedia articles on ancient Egypt, the Great Depression, and World War
II. Finding information and scholarly articles about the Health Care bill of 2010 controversy and how
Southern Sudan came to be a nation is going to be somewhat more difficult.
Time: Day of event Days later Weeks later Months later Years later
Sources
Television, Newspapers, Popular and mass Trade magazines Scholarly journals,
radio, web TV, radio, web market magazines and scholarly books, conference
journals proceedings
Reference sources
such as
encyclopedias
Type of General: who, Varies, some Still in reporting Research results, In-depth coverage
information what, where articles include stage, general, detailed and of a topic, edited
(usually not analysis, editorial, opinions, theoretical compilations of
why) statistics, statistics, discussion scholarly articles
photographs, photographs Bibliography relating to a topic
editorials, Usually no available at this General overview
opinions bibliography at this stage giving factual
stage information
Bibliography
available
Author Journalists Journalists Journalists: usually Specialists and Specialists and
not specialists in scholars in the scholars in the
field field field
Audience General public General public General public to Scholar, General public to
knowledgeable specialists, specialists
layperson students Scholars, students,
laypersons
Locating tools Web search Web search Web search tools, General and Library catalog,
tools, social tools, newspaper and subject-specific general and
networks newspaper and periodical databases subject-specific
periodical databases databases
databases Library reference
collection
Information and the Internet
What is the Internet?
The Internet is a computer network, in fact a network of computer networks, upon which anyone who has
access to a host computer can publish their own documents. One of these networks is the World Wide Web
(or just the Web) which allows Internet publishers to link to other documents on the network. The Internet
allows transmission of a variety of file types, including non-written multimedia.
Who puts info on the Internet?
There are many kinds of Internet sites that you might find during the course of a search, sites created by
different people or organizations with different objectives. The animation to the right under Exercise
illustrates some of the types of sites on the Web, using an example search for information about MP3
players.
URL
Every Web page has its own address called a Uniform Resource Locator (URL). Much like the address on an
envelope with a name, street address, city, state, and zip code, each part of a URL provides information about
the Web page.

Domain Names
The domain name tells you the type of organization sponsoring a page. It is a three-letter code that is part of
the URL and preceded by a "dot." Here are the most common domains.
Domain Description
.edu educational institution
Even though a page comes from an educational institution, it does not mean the
institution endorses the views published by students or faculty members.
.com commercial entity
Companies advertise, sell products, and publish annual reports and other company
information on the Web. Many online newspapers or journals also have .com names.
.gov government
Federal and state government agencies use the Web to publish legislation, census
information, weather data, tax forms and many other documents.
.org non-profit organization
Nonprofit organizations use the Web to promote their causes. These pages are good
sources to use when comparing different sides of an issue.
.net internet service providers
.mil U.S. military
In addition, more top level domain names were added in 2001.
Domain Description
.aero for the air transportation industry
.biz general use by businesses
.coop restricted use by cooperatives
.info for both commercial and non-commercial sites
.museum for museums
.name for use by individuals
.pro restricted to professionals and professional entities

Libraries and the Web


Although we've been making some distinctions between the Web and the library, the two aren't distinctly
different things. It's important to understand that there is a middle-ground—the idea of the library on the
Web. That is to say, many libraries have Web sites which organize information and provide access to
collections of quality resources.
One great thing about using the library on the Web is that the information has been evaluated and organized.
Sometimes the library has digitized part of their own collections for people around the world to use. Keep in
mind that although there is an increasing amount of information in this digital library, some information can
only be found in print resources.
Another aspect of this library is how easy it is for you to access. Library Web sites often have information
about library hours, policies, and contact information if you need assistance. If you are a student at a
university, you can use the library online 24 hours a day, seven days a week from any Internet-connected
computer.
Libraries vs the Web
 Starting with the library
 Starting with the Web
Library resources go through a review process
Librarians select books, magazines, journals, databases and other media sources. This selection process
allows the library to collect sources considered reliable, historically relevant, and valuable.
Library resources are free for your use
Libraries purchase subscriptions to journals, databases and other resources so they are available for your
research. These subscriptions are not cheap but the information is valuable, relevant and reliable.
Library resources are organized
Items in libraries are organized so you can easily find all the sources on a topic. For example, when you
search for a book in the library catalog you will get a call number. The call number will direct you to a
specific shelf in the library. The other books and bound journals near the same call number should cover a
similar topic.
Library resources are meant to be kept permanently
One of the primary functions of a library is to be an organized storehouse of in-depth information published
throughout time.. Current and historical information can be found in the library giving the student an picture
of how information on a topic developed.
Library resources come with personal assistance
Unlike the Internet which is primarily do-it-yourself, libraries have staff who are trained to assist you in
sorting through all these information sources. They can help you learn to use new tools and can answer any
questions you have. Some libraries even provide help through their Web sites. The Virginia Tech library has
2 reference or help desks located on the second and fourth floors. We also have an IM chat service and a
texting service for help. When all else fails, you can pick up a phone and call us or knock on a librarians door
for help.
Quality over quantity
Libraries have large collections of information on a variety of topics which have been carefully selected and
organized. The key idea when using the library is that you are getting QUALITY over QUANTITY. Print or
electronic library resources are the best sources to use when starting your research. You can efficiently find
quality information from a variety of credible resources in the library.
Choose the Best Search for Your Information Need (by noodletools)
Need additional search ideas? Try NoodleQuest (interactive version)
I need to define my topic...

I need to understand
thescope of my topic Guías Temáticas (UC3M Browse subject guides with descriptions of
library relevant sites

I need to see relatedtopics Google Uncover buried sites using "related searches"

Bing Search for your topic, then drill down


"related searches"

I need to refine and SurfWax Search for your topic, then click "Focus" (top)
narrowmy topic to show similar, broader, and narrower
topics

iSeek Education Ask a question or search a topic in this


database of "trusted resources" - use
"targets" to refine search

Wikipedia Drill down “Contents” to explore subtopics

I need to choose Hot Topics (Google Custom Begin your search on selective hot topic sites
acontroversial issue Search)

IDEA Portal Browse or search a debate topic database


with pro/con arguments

Glean Comparison Search Build your pro/con search using comparative


terms (K-12)

I need background on SweetSearch A selective search of Web information for


possible topics students

Wikipedia Search this wiki (quality content is starred),


then follow article links to more information

Columbia Encyclopedia Search basic factual information in this


encyclopedia (c 2000)
I need to find quality results...

I need authoritative sites


chosen by an expert LibGuides Librarian-created topic pathfinders
researcher
Infotopia Customized safe-search of educator-selected
sites

I need personal helpfrom Ask an ipl2 Librarian Get answers from volunteers and grad
experts students in a week (K-12)

AllExperts Ask your question of a self-identified


volunteer subject-expert
Ask a Librarian Library research questions answered in five
days (no homework questions)

I need sites ranked or Google High PageRank means popular, relevant sites
tagged as valuable or link to the page
relevant
Technorati Browse or search user-identified subjects
("tags") for blog advice or opinions

Ask.com High ExpertRank indicates subject-specific


popularity

I need primary sources American Memory Locate documents, sound recordings, images,
maps, and other American primary sources

Ready, 'Net, Go! Browse worldwide archival index

Library and Archival Search for online exhibitions selected by


Exhibitions on the Web Smithsonian librarians

I need peer-reviewed OpenJ-Gate Search narrow term in open-access global


journal articles journal literature (advanced search)

Directory of Open Access Search by narrow term to find open access


Journals journal articles or browse a subject directory
I need to do research in a specific discipline...

I need official government USA.gov Search official U.S. government information


information and services

Foreign Governments (Intl. Alphabetical links to the official websites of


Documents Coll., national governments
Northwestern U)

Directgov Search or browse official UK information and


services

I need in-depth CIA World Factbook Search on keyword or select country or


informationabout a location name
country or unrepresented
territory Unrepresented Nations and Browse Web sites from indigenous peoples,
Peoples Organization (UNPO) occupied nations, minorities and
independent states or territories

I need science, technology


engineering and National Science Digital Search by grade level, subject (STEM) and
mathematics (STEM) Library (NSDL) resource format
research or sources
I need MedlinePlus Search diseases, conditions and health topics
reputablehealthinformatio
n Mayo Clinic Browse by disease or search conditions,
symptoms, tests and health topics

KidsHealth Health information directed to kids, teens or


parents
I need legal documents, FindLaw Search cases and codes (U.S. federal and
agencies or news state), news and commentary

Legal Information Institute Search or browse wiki of worldwide legal


(Cornell U Law) information

State Legislative Websites Search database of U.S. state legislatures, DC


Directory and territories for home page, bills, press
rooms, statutes, etc.

I need creative and


performing arts sources Art History Resources on the Global art history resources chosen by a
Web scholar

JURN Google custom search of 3,500+ ejournals in


the arts and humanities
The timeliness of information that I need is...

within the last hour Google Real-Time Choose "past hour" from the left column for
the most recent news

10x10 Explore 100 words and pictures that "matter


most" globally in the last hour

today Google News View top news stories and refine by category
or topic

NewsNow Search breaking global news headlines or


newsfeeds (UK service)

Newseum Browse today's front page treatment of news


from nearly 100 countries

recent Google Search and limit by time period, or choose


"timeline" (from "more search tools" on left)
to see a topic's evolution

Yahoo News Search recent news and filter by time period


("past week")

recent (with analysis) BBC Special Reports In-depth topic coverage including news
features, analysis, photos, audio and video

Times Topics Collected news, reference and archival


information, photos, graphics, audio and
video files about topics

PolitiFact Search news keyword or browse fact


checking of statements made by members of
Congress, the White House, lobbyists and
interest groups

long-term investigative ProPublica News investigations of significant


reports government, business, and institutional
wrongdoing

Center for Public Integrity Original reporting of public issues designed


to make "institutional power more
transparent and accountable"

Center for Investigative Critical investigations of injustice or abuse of


Reporting power with actionable information to assist
citizens

a particular time period HistoryWorld Enter year event to retrieve timeline, then
(decade, century, era) click on icons for information or images

Wikipedia: List of Timelines Browse Wikipedia’s list of timelines


(civilizations, people, events, etc.)

historical American Memory Search America's primary source documents


and images, browse by topic, time period,
medium or place

ipl2 Drill down into the history section or search


fewer, broader terms (e.g., "terrorists" not
"Righteous Path")

Digital History High-quality historical resources, primary


sources, multimedia, subject guides

ancient Ancient History Sourcebook Search online ancient-history texts, images,


audio, or browse by region, period (e.g.,
Persia, Late Antiquity)

ipl2 Drill down into ancient history section or


search fewer, broader terms (e.g., "Greece"
not "Spartan women")
I need facts...

a person Biography.com Search 25,000+ biographies by name,


keyword and profession

Who2 Search for famous people with "four good


links" to more information

Dictionary of Canadian Search biographies from Canada's history


Biography Online (16th-20th century)

a place CIA World Factbook Select country for basic profile and
transnational issues

Country Studies Cross-search or view a country (last update


1998) - historical, social, economic, political
and national security data

Stately Knowledge Basic U.S. state facts, links to state


government site and encyclopedia (IPL
KidSpace)

a company NYPL: Searching for Find company information based on your


Company Information need
general reference answers Ask.com Find a fact, biography, statistic or conversion,
or reference answer

Columbia Encyclopedia Find basic information

Yahoo Learn shortcut words to get quick answers

news background (facts,


people, and documents)
I need opinions and perspectives...

I want opinions on current HeadlineSpot Browse opinion/editorial in U.S. and some


issues Opinion/Editorials international newspapers

PollingReport Browse results of U.S. public opinion surveys

Issues & Press (U.S. Dept. of Investigate U.S. position on international


State) issues

I need news from other World Press Review Get nonpartisan summaries of views outside
countries' perspectives U.S.

Newspaperindex Browse selected world newspapers

ABYZ News Links Browse international broadcast and Internet


news, newspapers, magazines, and press
agencies

I want multiple Social Issues Links to pro/con on current social issues


perspectives on hot social
and political topics Public Agenda Analysis of public attitudes on social issues
with overview, pro/con, organization links

UN News Centre: News Focus Browse for UN-related issues by region,


country or topic

I want to compare news Newseum Compare news reporting on U.S. front pages
treatment
PressDisplay Compare news reporting from 55 countries
I need a specific type of media...

maps Google Maps Search and view satellite and street-level


maps

MapMachine Search, browse and print country, physical


and political maps

American FactFinder Maps See trends/patterns when you superimpose


chosen geographical features and census data
on U.S. maps

photographs and visual Google Image Search Use advanced search to limit by size,
images coloration, file type

Yahoo Image Search Use advanced search to limit by size and


coloration
Flickr Search users' photos by their subjects
("tags"), then choose a subtopic ("tag cluster)

fine art Artcyclopedia Search (e.g., artist, medium, movement,


subject) or browse for digitized art and
online exhibits, seeactual size

Intute: Arts and Humanities Search or browse selected, evaluated


resources

videos Yahoo Video Search Search by keyword or phrase, use advanced


search to limit by file format, size, duration
and domain

YouTube Search for, watch or buy an ever-growing


collection of TV shows, movies, music videos,
documentaries, and personal productions

Internet Archive: Moving Search public domain films, newsreels, ads,


Image Archive documentaries, television series and other
“cultural artifacts”

radio PublicRadioFan Find public radio programs and podcasts


worldwide

music Google To find song lyrics, search (in quotes) title or


performer (plus) "song lyrics," to find
information about the song's history search
title (in quotes) and "origin"

sounds FindSounds Get sound effects and music samples: select a


keyword or search a term limited by file
format, quality and size

speeches American Rhetoric Search site or browse categories of full-text,


audio and video of American public speeches,
lectures, debates, interviews, events

History and Politics Out Loud Search or browse full-text public-domain


audio with transcripts of 20th century
political and historical events, personalities,
and protest songs

American Memory Find sound recordings of American primary


source speeches

quotations Bartleby.com Quotations Search classic passages, phrases and


proverbs (1901 edn.)

Quotations Page Search word, phrase, author within or across


quotation collections

Quoteland Search by keyword or browse topics, authors

statistical data Statistical Information Links to statistical databases and advice on


(NoodleTools) locating and understanding data
dictionary or thesaurus (for Yahoo Reference Definitions, etymology, synonyms (English
definitions, etymology, and Spanish) and audio-pronunciation
pronunciation, synonyms)
Merriam-Webster Dictionary Definitions, etymology, synonyms and audio-
and Thesaurus pronunciation

WordCentral Definitions with pictures (student dictionary)

encyclopedia Columbia Encyclopedia Factual information (updated 6/05)

Wikipedia Volunteer-created and collaboratively edited:


valuable for current topics (e.g., people in the
news), technology (e.g., podcasting) --
information quality uneven

almanac data Country at a Glance (U.N.) Basic country profile, use InfoNation to
compare data from 6 countries

Information Please Almanac Statistical and factual data -- beware popup


minefield

books and other printed WorldCat Search for books and reviews, options to
works refine results, check your local library's
holdings

Google Scholar Search (free and fee) scholarly works, locate


related information using "cited by" links,
use advanced search to limit (author, date,
phrase, in title, subject area)

I have special search requirements...

run my Giga Alert Run Google search of 3 interests and get e-


searchperiodicallyand be mail or RSS notification of new results
notified of new results
Google Alerts Run periodic Google searches and get e-mail
(no RSS) notification of new results

specify a country where my Google Limit by domain (advanced search) or by


search results are located country (Language Tools)

use a search engineoutside Search Engine Colossus Browse search engines and directories from
the U.S. countries and territories

find sites organized by Virtual LRC Select Dewey Decimal number before
theDewey Decimal searching
System orLibrary of
Congress Classification

locate resources by file Google Limit search by file type (.pdf, .ps, .doc, .xls,
type .ppt, .rtf)

Yahoo Limit search by file type (.html, .pdf, .doc, .xls,


.ppt, .xml, .txt)
I am...
a kid KidsClick! Find kid-friendly sites with educational content
(grades K-7) selected by librarians

Ask Kids Search by keyword, then select the best-match


question for answers and links

Yahoo Kids Select from kid-safe sites (grades 2- 7) organized


by topic

pretty new to the Internet Google Largest general purpose search


engine

Yahoo Large, general purpose search engine

ipl2 Easy-to-
navigate,
well-

an Internet wizard Exalead Configure highly-specific searches including


proximity ("folk tales NEAR sun"), but
smaller index than Google or Yahoo

LESSON 2: MANAGE YOUR REFERENCES


Activity objectives
 Get familiar with the functionalities and benefits of Reference Managers (RM) or Reference
Management Systems.
 Know two of the RM more widely used nowadays, Refworks and Mendeley, and their different
approaches.
 Learn how to create and manage databases/collections of references, and bibliographies.
 Learn how to create and obtain new references through various methods.
 Learn how to share and distribute your references, saving time and helping others too.
Rationale
There are many reference managers (some of them are free and some of them are licensed software).
RefWorks is licensed software paid by the University.
To manage your references, you have to think that there are many resources that you must use during your
degree and professional career, and it would be useful to keep track on them, and even create your own
collection of selected resources.
Some options are:
 You can manage your electronic references using only a Bookmarks manager (within your Browser
or external), or sharing your bookmarks through social bookmarking tools, like Delicious, Diigo,
or several others.
 You can use a social bibliographic system, where you will share not only URLs but also
bibliographic references using social citation services such as Bibsonomy.
 You can manage your complete bibliography using a local tool added to your Web Browser(for
example, using Zotero plug-in, with Firefox 3.0 or more).
 You can manage your complete bibliography using specific software Like RefWorks or EndNote, or
Mendeley, which all have a desktop version and an web access tool + a MobileApp.
Take a look at this very interesting Comparison of Reference Management Software.
Reference management tools offer us the following functionalities and benefits:
 import references from a number of difference sources, including library catalogues and e-journal
databases;
 manually add any references that you cannot find online;
 manage and edit references that you have added to your “library”;
 easily add references into your documents;
 format the bibliography you have created in your document using a specified reference style;
 share references with colleagues, students and your wider network;
 attach the full text of articles to the reference in your library;
 search the full text of your library, not just the information in the references;

Option A. Using Refworks


1. Get a personal account:
 Go to the Refworks page: https://refworks.proquest.com
 Register yourself in the system choosing a login and a password (the first time you do this should be
inside the university; afterwards you will receive a code to get connected from everywhere (“group
code”)
 Look at the video-turorials of the "new refworks" (in English).
 As you are working in groups, you can use just one account for the project, but it is advisable that all of
you have your own user account.

2. Create a database with references:


Refworks allows you to import previous databases (even made with another software like EndNote, etc.) or
a file previously saved. But as this is not going to be your case, we have many options to add new references,
even search the ones you cited in your papers, and then import them.
Some options will be:
 Introducing manually your references.
 Searching and importing your references from a library catalogues (OPAC) and databases
 Importing from databases and academic search engines.
o A lot of databases allow you import references using filters (see a
list:http://www.refworks.com/content/products/import_filter.asp).
o Academic search engine Google Scholar (http://scholar.google.es/) (you have first to modify
the settings: “Settings >> Bibliography manager>> Show links to import citations into
Refworks").
You can do some practising with any kind of document and experiment the possibilities offered. But, for your
Group project, you should have a database (collection, folder) with the documents cited in your papers and
those obtained later in order to complete the compulsory list of references.

3. Export the bibliography and insert citations in the paper/essay


Once you have finished your database for your paper, you will have to export it as a Bibliography and to
include in Word or in GDocs.

Option B. Using Mendeley


Mendeley is a social management tool, freely available to use at least for their basic services.
1. Get a personal account:
You can create an account with any email address or using your Facebook profile:
https://www.mendeley.com/join/?_section=header&_specific=
If you were to have any doubts, you can watch the videos from the Mendeley Youtube channel:
https://www.youtube.com/user/MendeleyResearch
Additional, more advanced, getting started tips can be found in Mendeley’s own getting started guide.

2. Create a database with references:


The term used in Mendeley for your whole database or collection of references is Library. They also use the
folders system as Refworks does. Moreover, you can create groups in order to share your references with a
public or private group of users. You can start creating a Practising folder and later, a folder or a group for
your Group Project.
Mendeley also allows you to add new references through various methods, depending on the tool version
(desktop or web). Some options are:
• Introducing manually your references (D/W)
• Searching and importing your references from the Mendeley library (W).
• Importing from previous libraries in EndNote™, BibTeX and RIS. (D)
• Importing from databases and webs, directly through the One-Click Web Importer. (W)
◦ You can import from any website, but information imported will be a little mess if there is
not a filter available. Here there is a list (http://www.mendeley.com/import/ ) of databases,
sites and services, Mendeley is working with.
◦ Adding PDFs and extracting data from them (D).
▪ In this case, Mendeley allows you to check that information and complete the details
using (CrossRef, PubMed, ArXiv, and Google Scholar).
The desktop version of this tool allows you to manage files, mainly PDFs, even watch folders with PDFs to be
indexed automatically, and work with the documents, annotating and taking notes.
3. Export the bibliography and insert citations in your paper (see step 7)
With Mendeley desktop version, there is also the option of inserting your references automatically, in the
desired output style. They offer you a Word and OpenOffice plug-in in order to do this. For other text editors,
as Google Drive one, it works too.

Search for DOIs (Crossref)


For the scientific publications, it is very useful to search for the DOI (Digital Object Identifier) to include in
your database.

LESSON 3: INFORMATION RETRIEVAL (GENERAL): SEARCH ENGINES AND


DATABASES
1. Information Retrieval
The problem of Information Retrieval

Goal = find documents relevant to an information need from a large document set.

Main problems in Information Retrieval


• Document and query indexing
◦ How to best represent their contents?
• Query evaluation (or retrieval process)
◦ How to express an information need?
◦ To what extent does a document correspond to a query?
• System evaluation
◦ How good is a system?
◦ Are the retrieved documents relevant? (precision)
Are all the relevant documents retrieved? (recall)

1.1 IR main concepts


Basic concepts, practically defined...
 Relevance: results fulfill your query,
 Pertinence: results fulfill your information need
 Recall: you retrieved a good % of what exists,
 Precision: you get only what you want, not much is irrelevant
 Noise: you get a lot of irrelevant hits
 Silence: You don’t get anything, you miss relevant hits
 Bias: you get only partial aspects of what’s available.
Relevance and Pertinence: some nuances
Relevance: Effective retrieved documents bearing the searched word (objective relevance)
Pertinence: A retrieved document is useful for a particular information need (subjective relevance)
Recall
In information retrieval, a measure of the effectiveness of a search
Expressed as the ratio of the number of relevant records or documents retrieved in response to the query to
the total number of relevant records or documents in the database.
Example:
In a database containing 100 records relevant to the topic “stem cells” a search retrieving 50 records, 20 of
which are relevant to the topic, would have 20% recall (20/100=0,2).
Precision
In information retrieval, a measure of effectiveness of a search.
Expressed as the ratio of relevant records or documents retrieved from a database to the total number
retrieved in response to the query
Example: in a database containing 100 records relevant to the topic “stem cells," a search retrieving 50
records, 20 of which are relevant to the topic, would have 40 percent precision (20/50=0,4).
Noise
<Is the inverse concept of precision.
It information retrieval, a measure of effectiveness (poor) of a search.
Expressed as the ratio of non relevant records or documents retrieved from a database to the total number
retrieved in response to the query.
To avoid noise, we could:
 Use specific terms
 Use operators (AND & NOT)
 Use search by phrase
 Avoid confusing words (polysemy)
 Make a good querying strategy
Silence
 It is the inverse concept of the recall
 In information retrieval, a measure of the effectiveness of a search
 Expressed as the ratio of the number of relevant records or documents non retrieved in response to
the query to the total number of relevant records or documents in the database.
 To avoid silence, we could:
o Use operator OR
o Use different varieties of a word (different languages)
o Use query expansion (synonyms, etc.)

1.2 Search design: success


What we dream of: the perfect strategy.

1.3 Search design: failure


What you obtain sometimes: the worst possible case
2. Search engines
What is a search engine?
 Several names: spiders, robots, bots, search engines, agents, web wanderers, wanders, web crawlers,
engines, web ants, indexes, directories, etc.
 The most common/accepted name at international level is search engine.
 A search engine is a software or set of software used for locating documents and information
through the WWW.
 It does an automatic indexing of the WWW and records the web pages in
a database to retrieve them later.
Search engines features
 Search systems based upon a software or robot that automatically indexes the Web.
 A web search engine is a tool designed to search for information on the World Wide Web.
 Search results are usually presented in a list and are commonly called hits.
 The information may consist of web pages, images, information and other types of files.
 Some search engines also mine data available in news, books, databases, or open directories.
 Unlike Web directories, which are maintained by human editors, search engines operate
algorithmically or are a mixture of algorithmic and human input.
2.1 Search engines parts
Search engines’ components:
• a robot
• automatic systems of analysis and indexing
• a data base
• a query system and query language
• a Web interface

How to search:
• Search by keywords typed in a box.
• Sometimes we can search also by some fields (advanced search).
Always: a querying language.

2.2 Search engines landscape


Everyday new search engines appear…
Everyday search engines disappears (Ex: Wisenut case, MSDewey)
Everyday some search engines are transformed
Best resources to know what is going on about search engines world and search business are:
 Search Engine Watch: http://searchenginewatch.com/
 Alexa: http://www.alexa.com/
Directories of search engines:
 http://www.searchenginecolossus.com/
 ¡Error! Referencia de hipervínculo no válida.
Some examples
 Google: http://www.google.com - http://www.google.es
 Yahoo!: http://search.yahoo.com/ http://es.search.yahoo.com
 Bing: http://www.bing.com
 Altavista: http://www.altavista.com - http://es.altavista.com
 Ask.com: http://www.ask.com - http://es.ask.com
 Baidu (chinese): http://www.baidu.com/
 Yandex (russian): http://www.yandex.ru/ http://www.yandex.com/
Some general tips before you start
• Read help screens, instructions, advice (tips, hints), tutorials, descriptions, of each database or
search engine.
◦ Underlying principles are the same, but applied differently in each of the resources.
• Experiment with all buttons, links, menus, etc...
• Read the periphery of the screen and scroll a lot.
• Write search terms in the language/s of the documents you search for in the database!!
• Use the advanced search menu!!
◦ It is more effective than basic searching (...and easier...it has guided functions).
• Try different terms, use those seen in documents already retrieved.

Preparing the search


• Objective: match the search query /equation with records of stored materials.
• First, self-diagnose information needs, focus on and specify the problem, the “unknown”.
• Identify & verbalize the question in several ways.
• Analyze the question, select clues to be used to formulate the strategy.
• Translate those clues into a language and strategy compatible with the system (machine or human,
or other).
• Formalize language and strategy in a mode compatible with the device or agent.

Selection of clues and expression of the query


• Predict:
◦ how authors have written
◦ how indexers have analyzed what authors have written
◦ how analytics (clues) were recorded.
• Use variations of expression.
• If you don’t know well the subject coverage of the database, begin with general terms.
• Specify more than one aspect or point of view of the subject.
3 Search techniques
3.1 Boolean operators
Logical operations applied to different search terms in a searching system.

When using these operators we will get the documents according with that conditions.

Boolean logic consists of logical operators:


• OR
• AND
• NOT
• and XOR

Boolean operators: AND


Default one in many search engines (Google)
We will get all the documents that have the first AND the second keywords.

Boolean operators: OR
We will get all the documents having the first keyword OR the second one à documents having either one.
Boolean operators: NOT (-)
We will get the documents that do NOT have the term
We use this operator to filter documents from a previous search. Ex.:

Boolean operators: XOR


We will get all the documents having the first keyword OR the second one, but not those documents having
both.

3.2 Other search operators


Proximity operators
There are other operators to improve search results or make our searches more precise.

Shorteners, wildcards
Search by fields

4 Databases
How is information processed and stored in a Database?
 DB have a structure (fields) & language
 Uniform criteria for selecting, processing and recording
 Formal analysis & Content analysis
o Tries to infer at the same time the intentions of the author and of the searcher
o Multidimensional
 Selection of resulting clues
 Translation into the system’s language
o Words, phrases, codes, numbers, etc.
o Control of the vocabulary and the subjects expressed
o Rules, syntaxes, indexing systems, classification schemes
 May include, in addition, full text / raw data
Translating search clues
 Clues can be words, terms, expressions, formulas, phrases, dates, numbers, codes, etc. and the
relationships between them.
 Translation is done in different ways depending on system characteristics:
o search equations / queries (a combination of search parameters (translated clues) and
search operators (boolean, others, truncation)) Eg.: ((“stem cells” AND biomechanic) NOT
engineering) PY=2012
o fill-in forms or query menus
o indexes or automated thesauri
o use of codes and classification schemes or taxonomies, etc.
o folksonomies
 In “friendly” systems: auxiliary functions (interface guides the translation).
 Command languages: more powerful, efficient and precise, but need training.
4.1 Information structure: databases
Terminology
The first thing you have to learn is a little bit of database terminology and concepts. Don't worry it isn't hard
or even very confusing (hopefully).
Ok from the big to the small. A "database" is a collection of related "tables". A "table" is a collection of related
"records". A "record" is a collection of related "fields". And a "field" is a collection of related pieces of
information (the stuff we are after when we work with databases). So:

Are you with me so far?


But what about cells, columns and rows? That is what everyone talks about. A cell is a specific piece of
information (again the stuff we are after when we use databases). Examples: John, 96915, programmer. A
column is a collection of specific types of information. Examples: A first name, a zip code, a job title. A row is
a collection of related columns. Hmmm - this is sounding familiar.

Database terminology for whatever reason uses multiple names for the same things at times. A record is a
row, a column is a field (it is also sometimes called an attribute), and a cell is a piece of information
(occasionally called data though technically it isn't).

Concepts
OK - now for a little more information and concepts about tables. Each table is required to have a way to
uniquely identify each record in it. This allows one record's information to be accessed amidst the thousands
of other records. The field (or combination of fields) that hold the unique identifier is called the primary key.
When considering what field to declare as a table's primary key, you must keep in mind if the field will EVER
possibly have 2 records with the same entry or if the field will ever change. A person's name typically does
not work since there are many John Browns in the world. An address may eventually be changed. A social
security number or generated account number may be a better option as a table's primary key.
Next major concept - indexes. An index allows a user to quickly find the information they are looking for.
Think of it like a book index - look up something and find exactly where it is in the book. A primary key is an
index (usually made automatically by the database). A table can have as many indexes as you want - to help
you find the information you seek. Just remember that the more indexes a table has the more space that is
being used by those indexes.
One more concept - the information contained in a cell usually is as granular as possible. What does that
mean? Basically it means that you break down the data into its smallest pieces. Well it is easiest to show an
example rather then explain.
You have a person's name. You can save it in the database attribute NAME as "Lennon, Luke M." or even
"Luke M. Lennon". But what if for some reason you want to know something about the Lennon Family? Now
you will have to manipulate the strings to isolate the last name. A better way to do it... Instead of having 1
field called NAME we should have 3 fields named FIRST_NAME, MIDDLE_NAME, and LAST_NAME. This will
allow a person's name to be broken down into its smallest parts. Making sense?
The last basic concept that I believe is important to know, is that all data held in the database is kept in a
random order. This includes the ordering of the fields (AKA columns) and the order of the information
inserted or returned (rows). In a database there is no difference if the columns are output as "Name Zip Job"
or "Job Name Zip". To the database they are the same thing. The database also does not care if the results
(generated by a query) positions the record containing "John" as first or 15th or last. We will later discuss a
way to guarantee an output's ordering by using the databases computer language.

4.2 Types of databases


Relational Databases Systems
• Utility: data storage.
• Some already allow indexing of documents.
• Very structured. With standard structure

Documentary databases systems


They are used to store, manage and retrieve quickly and easily very large documents, not very structured:
books, articles, photographs ...
They tend to be unstructured. Their structures are not typically standardized.
These databases can store:
• Bibliographic references to documents. (Reference)
• Complete documents in text or image. (Sources)
• Documents and references.
Documentary/Bibliographic databases
Their records describe briefly all kinds of documents (journal articles, books, journals, reports, conference
communications, patents, etc..)
Data provided:
• Bibliographic (author, title, source, publisher, date, language, publication type, etc..)
• Content (subject, keywords, classification, summary)
• Location and access (signatures, libraries, websites ...)
• Full text in digital format (not all).

Information retrieval functionalities:


• Inverted indexes.
• Fields with accepted values.
• Stemming (variants of terms).
• Dictionaries of synonyms and acronyms.
• Ability to integrate controlled vocabularies (thesaurus).
• Sorting by several criteria (date, relevance…).
• Clustering results.
• Working with subsets of results.
Using a wide range of operators in queries

4.3 Learn to Search a Database


Learn to Search a Database
How can you figure out the best way to search a new or unfamiliar journal database? Although many
databases look different at first, most have similar features. Understanding these basic features will improve
the efficiency and effectiveness of your searching. It will save you time and also will improve the accuracy
and comprehensiveness of your searches. What you learn with one database can be applied in most cases to
other databases that you encounter. Below is a list of things to consider about any database and some tips on
how to determine the specific features of the databases you wish to use.

Scope: What subject areas are being covered? What years are
What is in the database? covered? What type of materials (journals, books, book chapters,
dissertations, etc.) are included? Can you find a list of journals or
other materials that are included in the database? Check for any
links to “About this database” for the answers to these questions.

Can you search by keyword, subject, author, title of article or


What does it search? journal? What is the default search and how do you switch to other
types? Clicking on “Help” should give you information about the
various searches available.

How does it search?

Determine whether the database considers multiple words as a


Phrase versus Word searching single phrase or a combination of words connected by OR (any of
the words), or AND (all of the words). Use OR searches to broaden
your search. Use AND searches to narrow your search. Check to see
what the default search is and also if other options are available.

Most databases allow you to search on a truncated (abbreviated)


form of a word plus a wildcard. The wildcard must be directly next
Truncation to the truncated word for this to work. Check to see what the
truncation symbol is in the database. The most common truncation
symbol is *(asterisk). Others include #, !, and $. For
example, psych* will retrieve items on psychology, psychotherapy,
psychotic, etc.

Some databases offer the option of searching by controlled


Controlled vocabulary terms. These authorized terms describe topics in the
vocabulary/thesaurus database and are frequently collected in a thesaurus. Using the
searching controlled vocabulary/thesaurus terms in your search ensures that
items retrieved are specifically on the topic of interest. Check to see
if your database has an online thesaurus to determine the best
controlled-vocabulary terms to use for your search.
What do I do if I get too
many results?

Most databases allow you to narrow your search by selecting


Are there any limit options?
specific dates, language, and publication types. Some databases also
allow you to restrict your search to titles for very precise results.

One quick way to reduce your results and focus your search is to
add one or more additional concepts to your search. See if you can
Can you combine searches or type more terms into your search box, or if you need to modify your
add more concepts to your search in another way. Also, is there a “Search History” feature
original search? available? If so, you may be able to combine some of your previous
searches into a new one that should reduce your results and focus
your search. Also try focusing your search by using controlled
vocabulary terms as described in the “How does it search?”section
above.

What do I do if I get too few


results?

The more concepts you combine in a search, the fewer results you
Eliminate concepts
are likely to retrieve. If you get little or no results from your search,
try eliminating some of your concepts, limits, or modifiers.

Some databases offer a “Related Articles” or “Find Similar Results”


Related articles feature that enables you to expand your search. If you find only one
or two articles on your topic, see if this feature is available. Clicking
on a Related Articles link will allow you to retrieve more articles
similar to the one with which you started.

Another way to expand your results is to do a “Cited Reference


Search” on any relevant article you might have. This feature is
Cited reference search available in databases offered through Web of Science. When you
perform this type of search, you will retrieve articles that have cited
the original article. You can use a cited reference search to find more
up-to-date material on your topic, since retrieved material from this
type of search will be more current than the original article.

Many databases allow you to check if ASU library subscribes to the


material you retrieve in your search. For many databases to which
How do I locate material the ASU Libraries subscribe, electronic or print availability can be
from my search? determined via the Find[at]ASU button. Click on this link to access
the electronics versions of any material provided by ASU Libraries
in that format or to find the location (call number) for the print
version in the library.
Most databases offer the option of printing your results. Others also
allow for e-mailing and/or downloading your results. Once you have
How do I print, email or
selected all your items, click on the print, e-mail, or download option
download my results?
and follow the on-screen directions. You usually can customize the
results to include abstracts and/or subject terms. To download into
bibliographic management software, such as EndNote or Zotero,
check to see your options and format your results appropriately.

Some Final Tips

-A careful examination of the screen often will yield a lot of


information about the database.
Read the screen
-See if there is an example of how to type in your search (including
the truncation sign used in the database) near the search box.
-Look for pull-down menus that might offer you ways to limit,
modify, or otherwise alter your search.

-See if there is a Help icon or button to obtain additional information


Look for help about the database.
-Don’t forget “mouse-overs.” You frequently can obtain more information
just by rolling your mouse over the icons and/or buttons on the screen.
-Ask a librarian. We are available to help you further dissect the
databases you are trying to use and to offer advice on how to further
refine your searches.

5 Deep Web
The Web is fast becoming a titanic, complex entity. By the year 2015, it’s estimated that one zettabyte of
content will be added to the web each and every year. Navigating this sea of information presents more and
more of a challenge -- particularly when much of that content is not easily accessed by traditional search
engines.

When most of us think of the Web, we think of the 'Surface Web', also known as the visible web - the
webpages we access directly, via links or via common search engines like Google. However, the Surface Web
makes up just 4 percent of all the content on the Internet.
The ‘Deep Web’ or ‘Invisible Web’ is several orders of magnitude larger than the Surface Web and represents
a staggering 96 percent of information on the Web. This content includes:
• Dynamic or scripted content
• Unlinked content - pages which are not linked to by other pages, which may prevent web crawling
pprograms from accessing the content.
• Private or password-protected websites
• Webpages with content varying for different access contexts (e.g., ranges of client IP addresses or
previous navigation sequence).
• Limited access content - sites that limit access to their pages in a technical way
• Non-HTML/text content - textual content encoded in multimedia (image or video) files or specific file
formats not handled by search engines.

Today, the invisible web means:


• Databases
• Library catalogs and other bibliographic data bases
• Data bases of electronic journals
• Documents in formats/web technologies not good for indexing (ASP or PHP)
• Interactive tools newsgroups or listservs
• Material not linked or hidden in the servers
• Statistical resources in different knowledge bases
Etc.