You are on page 1of 43

BIBLIOMETRICS

Tefko Saracevic Rutgers University


http://www.scils.rutgers.edu/~tefko

Tefko Saracevic

11

What is?
all studies which seek to quantify processes of written communication.
Pritchard

the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it.
Fairthorne

Recorded communication - literature-> quantitative methods


Tefko Saracevic 2

Alan Pritchard 1969


Coined the term "bibliometrics" "the application of mathematics and statistical methods to books and other media of communication
Journal of Documentation (1969) 25(4):348-349

Tefko Saracevic

and other related metrics


Also used to study broader than books, articles
Scientometrics
covering science in general, not just publications

Infometrics
all information objects

Webmetrics or cybermetrics
web connections, manifestations using bibliometric techniques to study the relationship or properties of different sites on the web
Tefko Saracevic 4

Concepts
Basic (primitive) concepts: 1. Subject 2. Recorded communication -> document, information object 3. Subject literature Bibliometrics related to:
science of science sociology of science - numerical methods
Tefko Saracevic 5

Literature studies
Qualitative
often in humanities, librarianship

Quantitative
bibliometrics

Mixed

Tefko Saracevic

Reasons for quantitative studies of literature


Analysis of structure and dynamics
search for regularities - predictions possible

Understanding of patterns
order out of documentary chaos verification of models, assumptions

Rationale for policies & design

Tefko Saracevic

Why quantitative studies?


Qualitative methods often depend on assertions. authoritative statements, anecdotal evidence Science searches for regularities Success of statistical methods in social sciences Need for justification & basis for decisions Something can be counted - irresistible
Tefko Saracevic 8

Application in ...
History of science Sociology of science Science policy; resource allocation Library selection, weeding, policies Information organization Information management
utilization
Tefko Saracevic 9

Historical note
Bibliometrics long precedes information science But found intellectual home in information science
study of a basic phenomenon - literature

It is not hot lately, but still produces very interesting results Branched out into web studies (web is a literature as well)

Tefko Saracevic

10

What studied?
Governed by data available in documents or information resources in general - that what can be counted
author(s) origin
organization, country, language

source
journal, publisher, patent
Tefko Saracevic 11

what more
contents
text, parts of text, subject, classes

representation citations
to a document, in a document, co-citation

utilization
circulation, various uses

links any other quantifiable attribute


Tefko Saracevic 12

Tools
Science Citation Index Compilation of variables from journals in a subject Use data Publication counts from indexes, or other data bases Web structures, links
Tefko Saracevic 13

Variable: authors
number in a subject, field, institution, country growth correlation with indicators like GNP, energy etc. productivity e.g. Lotkas law collaboration - co-authorship, associated networks dynamics - productive life, transcience, epidemics papers/author in a subject mapping
Tefko Saracevic 14

Variable: origin
Rates of production, size, growth by
country, institution, language, subject

Comparison between these Correlation with economic & other indicators

Tefko Saracevic

15

Variable: sources
Concentration most often on journals Growth, dynamics, numbers
information explosion - exponential laws time movements, life cycles

Scatter - quantity/yield distribution


Bradfords law

Various distributions
by subject, language, country
Tefko Saracevic 16

Variable: contents
Analysis of texts
distribution of words Zipfs law words, phrases in various parts subject analysis, classification co-word analysis

Tefko Saracevic

17

Variable: representation
frequency of use of index terms, classes distribution laws - key terms where? thesaurus structure

Tefko Saracevic

18

Variable: citations
Studied a lot; many pragmatic results
base for citation indexes, web of science, impact factors, co-citation studies etc

Derived:
number of references in articles number of citations to articles
research front; citation classics

bibliographic coup[ling
Tefko Saracevic 19

citations more
co-citations
author connections, subject structure, networks, maps

centrality
of authors, papers

validation with qualitative methods impact

Tefko Saracevic

20

Variable: utilization
frequency distribution of requests for sources, titles
e.g. 20/80 law

relevance judgement distributions circulation patterns use patterns

Tefko Saracevic

21

Variable: links
Development of link-based metrics
in-links, out-links

Web structure Web page depth; update PageRank vs quality

Tefko Saracevic

22

Examples from classic studies


Comparative publications over centuries Number of journals founded over time Number of abstracts published over time National share of abstracts in chemistry National scientific size vs. economy size Bibliographic coupling and co-citation Web structures, links

Tefko Saracevic

23

Examples of laws & methods


Lotkas law Bradfords law Zipfs law Impact factor Citation structures Co-citation structures

Tefko Saracevic

24

Alfred J. Lotka 1926


Statisticsthe frequency distribution of
scientific productivity Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science
Looked at Chemical Abstracts Index, then

Geschichtstafeln der Physik J. Washington Acad. Sci. 16:317-325

Tefko Saracevic

25

Lotkas law: xn y = C
The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x. Where:
x y = = = = number of publications no. of authors credited with x publications constant (equals 2 for scientific subjects) constant

n C

inverse square law of scientific productivity


Tefko Saracevic 26

Lotka's Law - scientific publications

No. of authors
1 publ.

2 publ.

3 publ.

4 publ.

Tefko Saracevic

xn y = C

27

Samuel Clement Bradford 1934, 1948


Distribution of quantity vs yield of sources of
information on specific subjects
he studied journals as sources, but applicable to other what journals produce how many articles in a subject and how are they distributed? or How are articles in a subject scattered across journals?

Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called documentary chaos
First published in: Engineering (1934) 137:85-86, then in his book Documentation, (1948)
Tefko Saracevic 28

Bradfords law
"If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 "
Tefko Saracevic 29

Bradford's Law of Scattering an idealized example


No. of No. of articles per source source journals 60 1 3 2 35 30 1 25 2 9 2 9 8 4 6 10 5 7 27 5 4 3 5
Tefko Saracevic

Total no. of articles 60 130 70 30 50 18 130 32 60 35 130 20 15


30

Bradford's Law of Scattering zones

nucleus

3 sources 130 articles

9 sources 130 articles


27 sources 130 articles
Tefko Saracevic

Garfield hypothesis
31

George Kingsley Zipf 1935, 1949


The psycho-biology of language: an introduction

to dynamic philology (1935) Human behavior and the principle of least effort: An introduction to human ecology (1949)
Looked, among others, at frequency distributions of words in given texts
counted distribution in James Joyces Ulysses

Provided an explanation as to why the found distributions happen: Principle of least effort
Tefko Saracevic 32

Zipfs law: r f = c
Where:
rank (in terms of frequency) frequency (no. of times the given word is used in the text) c = constant for the given text For a given text the rank of a word multiplied by the frequency is a constant Works well for high frequency words, not so well for low thus a number of modifications
Tefko Saracevic 33

r = f =

Charles F. Gosnell 1944 Obsolescence


He studied obsolescence of books in academic libraries via their use
College Res. Libr. (1994) 5:115-125

But this was extended to study of articles via citations, and other sources Age of citations in articles in a subject:
half life half of the citations are x year old etc
different subjects have very different half-lives

Tefko Saracevic

34

Curve of obsolescence

Age at time of use


Tefko Saracevic 35

Eugene Garfield 1955


Focused on scientific & scholarly communication
based on citations
Science (1995) 122:108-111

Founded Institute for Scientific Information (ISI)


major proeduct now ISI Web of Knowledge

Impact factor for journals, based on how much is a journal cited Mapping of a literature in a subject Citation indexes/web of knowledge
MAJOR resources in bibliometric studies
Tefko Saracevic 36

Citation matrix
citing article citing article citing article citing article
37

cited article cited article cited article


Tefko Saracevic

citing article citing article

article

citing article

Science Citation Index


Association-of-ideas index cited article cited article cited article
Tefko Saracevic

citing article citing article

citing article citing article citing article citing article


38

article

citing article

Co-citation analysis
Articles that cite the same article are likely to both be of interest to the reader of the cited article

citing article

article
citing article
These two articles are likely to be related
39

Tefko Saracevic

Impact factor (IF)


number of citations received in current year by papers published in the journal in the previous two years divided by number of papers published in the journal in the previous two years IF has become over time a crucial indicator of journal quality and Reported in Journal Citation Reports (1976-)
Tefko Saracevic

given ISI a monopoly position in the evaluation of journal quality


40

Garfields HistCite
Bibiliographic Analysis and Visualization Software Provides citation statistics & graphs for people, journals, institutions
various citations scores, no. of cited references in articles various graphs with connections

Example: articles and authors for JASIST (and predecessor names) for 1956-2004
includes citations to authors
Tefko Saracevic 41

Conclusion
Bibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objects
some general, predictive laws formulated structures have been exposed, graphed myriad data collected & analyzed

A good area for research!


Tefko Saracevic 42

Sources used in making this presentation among others


Ruth Palmquist Bibliometrics Donna Bair-Mundy Boolean, bibliometrics, and beyond Short set of bibliometric exercises by J. Downie

http://people.lis.uiuc.edu/~jdownie/biblio/

Tefko Saracevic

43