You are on page 1of 43


Tefko Saracevic Rutgers University

Tefko Saracevic


What is?
all studies which seek to quantify processes of written communication.

the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it.

Recorded communication - literature-> quantitative methods

Tefko Saracevic 2

Alan Pritchard 1969

Coined the term "bibliometrics" "the application of mathematics and statistical methods to books and other media of communication
Journal of Documentation (1969) 25(4):348-349

Tefko Saracevic

and other related metrics

Also used to study broader than books, articles
covering science in general, not just publications

all information objects

Webmetrics or cybermetrics
web connections, manifestations using bibliometric techniques to study the relationship or properties of different sites on the web
Tefko Saracevic 4

Basic (primitive) concepts: 1. Subject 2. Recorded communication -> document, information object 3. Subject literature Bibliometrics related to:
science of science sociology of science - numerical methods
Tefko Saracevic 5

Literature studies
often in humanities, librarianship



Tefko Saracevic

Reasons for quantitative studies of literature

Analysis of structure and dynamics
search for regularities - predictions possible

Understanding of patterns
order out of documentary chaos verification of models, assumptions

Rationale for policies & design

Tefko Saracevic

Why quantitative studies?

Qualitative methods often depend on assertions. authoritative statements, anecdotal evidence Science searches for regularities Success of statistical methods in social sciences Need for justification & basis for decisions Something can be counted - irresistible
Tefko Saracevic 8

Application in ...
History of science Sociology of science Science policy; resource allocation Library selection, weeding, policies Information organization Information management
Tefko Saracevic 9

Historical note
Bibliometrics long precedes information science But found intellectual home in information science
study of a basic phenomenon - literature

It is not hot lately, but still produces very interesting results Branched out into web studies (web is a literature as well)

Tefko Saracevic


What studied?
Governed by data available in documents or information resources in general - that what can be counted
author(s) origin
organization, country, language

journal, publisher, patent
Tefko Saracevic 11

what more
text, parts of text, subject, classes

representation citations
to a document, in a document, co-citation

circulation, various uses

links any other quantifiable attribute

Tefko Saracevic 12

Science Citation Index Compilation of variables from journals in a subject Use data Publication counts from indexes, or other data bases Web structures, links
Tefko Saracevic 13

Variable: authors
number in a subject, field, institution, country growth correlation with indicators like GNP, energy etc. productivity e.g. Lotkas law collaboration - co-authorship, associated networks dynamics - productive life, transcience, epidemics papers/author in a subject mapping
Tefko Saracevic 14

Variable: origin
Rates of production, size, growth by
country, institution, language, subject

Comparison between these Correlation with economic & other indicators

Tefko Saracevic


Variable: sources
Concentration most often on journals Growth, dynamics, numbers
information explosion - exponential laws time movements, life cycles

Scatter - quantity/yield distribution

Bradfords law

Various distributions
by subject, language, country
Tefko Saracevic 16

Variable: contents
Analysis of texts
distribution of words Zipfs law words, phrases in various parts subject analysis, classification co-word analysis

Tefko Saracevic


Variable: representation
frequency of use of index terms, classes distribution laws - key terms where? thesaurus structure

Tefko Saracevic


Variable: citations
Studied a lot; many pragmatic results
base for citation indexes, web of science, impact factors, co-citation studies etc

number of references in articles number of citations to articles
research front; citation classics

bibliographic coup[ling
Tefko Saracevic 19

citations more
author connections, subject structure, networks, maps

of authors, papers

validation with qualitative methods impact

Tefko Saracevic


Variable: utilization
frequency distribution of requests for sources, titles
e.g. 20/80 law

relevance judgement distributions circulation patterns use patterns

Tefko Saracevic


Variable: links
Development of link-based metrics
in-links, out-links

Web structure Web page depth; update PageRank vs quality

Tefko Saracevic


Examples from classic studies

Comparative publications over centuries Number of journals founded over time Number of abstracts published over time National share of abstracts in chemistry National scientific size vs. economy size Bibliographic coupling and co-citation Web structures, links

Tefko Saracevic


Examples of laws & methods

Lotkas law Bradfords law Zipfs law Impact factor Citation structures Co-citation structures

Tefko Saracevic


Alfred J. Lotka 1926

Statisticsthe frequency distribution of
scientific productivity Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science
Looked at Chemical Abstracts Index, then

Geschichtstafeln der Physik J. Washington Acad. Sci. 16:317-325

Tefko Saracevic


Lotkas law: xn y = C
The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x. Where:
x y = = = = number of publications no. of authors credited with x publications constant (equals 2 for scientific subjects) constant

n C

inverse square law of scientific productivity

Tefko Saracevic 26

Lotka's Law - scientific publications

No. of authors
1 publ.

2 publ.

3 publ.

4 publ.

Tefko Saracevic

xn y = C


Samuel Clement Bradford 1934, 1948

Distribution of quantity vs yield of sources of
information on specific subjects
he studied journals as sources, but applicable to other what journals produce how many articles in a subject and how are they distributed? or How are articles in a subject scattered across journals?

Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called documentary chaos
First published in: Engineering (1934) 137:85-86, then in his book Documentation, (1948)
Tefko Saracevic 28

Bradfords law
"If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 "
Tefko Saracevic 29

Bradford's Law of Scattering an idealized example

No. of No. of articles per source source journals 60 1 3 2 35 30 1 25 2 9 2 9 8 4 6 10 5 7 27 5 4 3 5
Tefko Saracevic

Total no. of articles 60 130 70 30 50 18 130 32 60 35 130 20 15


Bradford's Law of Scattering zones


3 sources 130 articles

9 sources 130 articles

27 sources 130 articles
Tefko Saracevic

Garfield hypothesis

George Kingsley Zipf 1935, 1949

The psycho-biology of language: an introduction

to dynamic philology (1935) Human behavior and the principle of least effort: An introduction to human ecology (1949)
Looked, among others, at frequency distributions of words in given texts
counted distribution in James Joyces Ulysses

Provided an explanation as to why the found distributions happen: Principle of least effort
Tefko Saracevic 32

Zipfs law: r f = c
rank (in terms of frequency) frequency (no. of times the given word is used in the text) c = constant for the given text For a given text the rank of a word multiplied by the frequency is a constant Works well for high frequency words, not so well for low thus a number of modifications
Tefko Saracevic 33

r = f =

Charles F. Gosnell 1944 Obsolescence

He studied obsolescence of books in academic libraries via their use
College Res. Libr. (1994) 5:115-125

But this was extended to study of articles via citations, and other sources Age of citations in articles in a subject:
half life half of the citations are x year old etc
different subjects have very different half-lives

Tefko Saracevic


Curve of obsolescence

Age at time of use

Tefko Saracevic 35

Eugene Garfield 1955

Focused on scientific & scholarly communication
based on citations
Science (1995) 122:108-111

Founded Institute for Scientific Information (ISI)

major proeduct now ISI Web of Knowledge

Impact factor for journals, based on how much is a journal cited Mapping of a literature in a subject Citation indexes/web of knowledge
MAJOR resources in bibliometric studies
Tefko Saracevic 36

Citation matrix
citing article citing article citing article citing article

cited article cited article cited article

Tefko Saracevic

citing article citing article


citing article

Science Citation Index

Association-of-ideas index cited article cited article cited article
Tefko Saracevic

citing article citing article

citing article citing article citing article citing article



citing article

Co-citation analysis
Articles that cite the same article are likely to both be of interest to the reader of the cited article

citing article

citing article
These two articles are likely to be related

Tefko Saracevic

Impact factor (IF)

number of citations received in current year by papers published in the journal in the previous two years divided by number of papers published in the journal in the previous two years IF has become over time a crucial indicator of journal quality and Reported in Journal Citation Reports (1976-)
Tefko Saracevic

given ISI a monopoly position in the evaluation of journal quality


Garfields HistCite
Bibiliographic Analysis and Visualization Software Provides citation statistics & graphs for people, journals, institutions
various citations scores, no. of cited references in articles various graphs with connections

Example: articles and authors for JASIST (and predecessor names) for 1956-2004
includes citations to authors
Tefko Saracevic 41

Bibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objects
some general, predictive laws formulated structures have been exposed, graphed myriad data collected & analyzed

A good area for research!

Tefko Saracevic 42

Sources used in making this presentation among others

Ruth Palmquist Bibliometrics Donna Bair-Mundy Boolean, bibliometrics, and beyond Short set of bibliometric exercises by J. Downie

Tefko Saracevic