You are on page 1of 8

Citation-based clustering of publications using CitNetExplorer and VOSviewer.

Clustering scientific publications in an important problem in bibliometric


research. We demonstrate how two software tools, CitNetExplorer and VOSviewer, can
be used to cluster publications and to analyze the resulting clustering solutions.
CitNetExplorer is used to cluster a large set of publications in the field of
astronomy and astrophysics. The publications are clustered based on direct citation
relations. CitNetExplorer and VOSviewer are used together to analyze the resulting
clustering solutions. Both tools use visualizations to support the analysis of the
clustering solutions, with CitNetExplorer focusing on the analysis at the level of
individual publications and VOSviewer focusing on the analysis at an aggregate
level. The demonstration provided in this paper shows how a clustering of
publications can be created and analyzed using freely available software tools.
Using the approach presented in this paper, bibliometricians are able to carry out
sophisticated cluster analyses without the need to have a deep knowledge of
clustering techniques and without requiring advanced computer skills.
Constructing bibliometric networks: A comparison between full and fractional
counting. The analysis of bibliometric networks, such as co-authorship,
bibliographic coupling, and co-citation networks, has received a considerable
amount of attention. Much less attention has been paid to the construction of these
networks. We point out that different approaches can be taken to construct a
bibliometric network. Normally the full counting approach is used, but we propose
an alternative fractional counting approach. The basic idea of the fractional
counting approach is that each action, such as co-authoring or citing a
publication, should have equal weight, regardless of for instance the number of
authors, citations, or references of a publication. We present two empirical
analyses in which the full and fractional counting approaches yield very different
results. These analyses deal with co-authorship networks of universities and
bibliographic coupling networks of journals. Based on theoretical considerations
and on the empirical analyses, we conclude that for many purposes the fractional
counting approach is preferable over the full counting one. (C) 2016 Elsevier Ltd.
All rights reserved.
The elephant in the room: The problem of quantifying productivity in evaluative
scientometrics.
Clustering Scientific Publications Based on Citation Relations: A Systematic
Comparison of Different Methods. Clustering methods are applied regularly in the
bibliometric literature to identify research areas or scientific fields. These
methods are for instance used to group publications into clusters based on their
relations in a citation network. In the network science literature, many clustering
methods, often referred to as graph partitioning or community detection techniques,
have been developed. Focusing on the problem of clustering the publications in a
citation network, we present a systematic comparison of the performance of a large
number of these clustering methods. Using a number of different citation networks,
some of them relatively small and others very large, we extensively study the
statistical properties of the results provided by different methods. In addition,
we also carry out an expert-based assessment of the results produced by different
methods. The expert-based assessment focuses on publications in the field of
scientometrics. Our findings seem to indicate that there is a trade-off between
different properties that may be considered desirable for a good clustering of
publications. Overall, map equation methods appear to perform best in our analysis,
suggesting that these methods deserve more attention from the bibliometric
community.
Field-normalized citation impact indicators and the choice of an appropriate
counting method. Bibliometric studies often rely on field-normalized citation
impact indicators in order to make comparisons between scientific fields. We
discuss the connection between field normalization and the choice of a counting
method for handling publications with multiple co-authors. Our focus is on the
choice between full counting and fractional counting. Based on an extensive
theoretical and empirical analysis, we argue that properly field-normalized results
cannot be obtained when full counting is used. Fractional counting does provide
results that are properly field normalized. We therefore recommend the use of
fractional counting in bibliometric studies that require field normalization,
especially in studies at the level of countries and research organizations. We also
compare different variants of fractional counting. In general, it seems best to use
either the author-level or the address-level variant of fractional counting. (C)
2015 Elsevier Ltd. All rights reserved.
CitNetExplorer: A new software tool for analyzing and visualizing citation
networks. We present CitNetExplorer, a new software tool for analyzing and
visualizing citation networks of scientific publications. CitNetExplorer can for
instance be used to study the development of a research field, to delineate the
literature on a research topic, and to support literature reviewing. We first
introduce the main concepts that need to be understood when working with
CitNetExplorer. We then demonstrate CitNetExplorer by using the tool to analyze the
scientometric literature and the literature on community detection in networks.
Finally, we discuss some technical details on the construction, visualization, and
analysis of citation networks in CitNetExplorer. (C) 2014 Elsevier Ltd. All rights
reserved.
Mapping patient safety: a large-scale literature review using bibliometric
visualisation techniques. Background The amount of scientific literature available
is often overwhelming, making it difficult for researchers to have a good overview
of the literature and to see relations between different developments.
Visualisation techniques based on bibliometric data are helpful in obtaining an
overview of the literature on complex research topics, and have been applied here
to the topic of patient safety (PS). Methods On the basis of title words and
citation relations, publications in the period 2000-2010 related to PS were
identified in the Scopus bibliographic database. A visualisation of the most
frequently cited PS publications was produced based on direct and indirect citation
relations between publications. Terms were extracted from titles and abstracts of
the publications, and a visualisation of the most important terms was created. The
main PS-related topics studied in the literature were identified using a technique
for clustering publications and terms. Results A total of 8480 publications were
identified, of which the 1462 most frequently cited ones were included in the
visualisation. The publications were clustered into 19 clusters, which were grouped
into three categories: (1) magnitude of PS problems (42% of all included
publications); (2) PS risk factors (31%) and (3) implementation of solutions (19%).
In the visualisation of PS-related terms, five clusters were identified: (1)
medication; (2) measuring harm; (3) PS culture; (4) physician; (5) training,
education and communication. Both analysis at publication and term level indicate
an increasing focus on risk factors. Conclusions A bibliometric visualisation
approach makes it possible to analyse large amounts of literature. This approach is
very useful for improving one's understanding of a complex research topic such as
PS and for suggesting new research directions or alternative research priorities.
For PS research, the approach suggests that more research on implementing PS
improvement initiatives might be needed.
A smart local moving algorithm for large-scale modularity-based community
detection. We introduce a new algorithm for modularity-based community detection in
large networks. The algorithm, which we refer to as a smart local moving algorithm,
takes advantage of a well-known local moving heuristic that is also used by other
algorithms. Compared with these other algorithms, our proposed algorithm uses the
local moving heuristic in a more sophisticated way. Based on an analysis of a
diverse set of networks, we show that our smart local moving algorithm identifies
community structures with higher modularity values than other algorithms for large-
scale modularity optimization, among which the popular "Louvain algorithm". The
computational efficiency of our algorithm makes it possible to perform community
detection in networks with tens of millions of nodes and hundreds of millions of
edges. Our smart local moving algorithm also performs well in small and medium-
sized networks. In short computing times, it identifies community structures with
modularity values equally high as, or almost as high as, the highest values
reported in the literature, and sometimes even higher than the highest values found
in the literature.
Source normalized indicators of citation impact: an overview of different
approaches and an empirical comparison. Different scientific fields have different
citation practices. Citation-based bibliometric indicators need to normalize for
such differences between fields in order to allow for meaningful between-field
comparisons of citation impact. Traditionally, normalization for field differences
has usually been done based on a field classification system. In this approach,
each publication belongs to one or more fields and the citation impact of a
publication is calculated relative to the other publications in the same field.
Recently, the idea of source normalization was introduced, which offers an
alternative approach to normalize for field differences. In this approach,
normalization is done by looking at the referencing behavior of citing publications
or citing journals. In this paper, we provide an overview of a number of source
normalization approaches and we empirically compare these approaches with a
traditional normalization approach based on a field classification system. We also
pay attention to the issue of the selection of the journals to be included in a
normalization for field differences. Our analysis indicates a number of problems of
the traditional classification-system-based normalization approach, suggesting that
source normalization approaches may yield more accurate results.
Citation Analysis May Severely Underestimate the Impact of Clinical Research as
Compared to Basic Research. Background: Citation analysis has become an important
tool for research performance assessment in the medical sciences. However,
different areas of medical research may have considerably different citation
practices, even within the same medical field. Because of this, it is unclear to
what extent citation-based bibliometric indicators allow for valid comparisons
between research units active in different areas of medical research. Methodology:
A visualization methodology is introduced that reveals differences in citation
practices between medical research areas. The methodology extracts terms from the
titles and abstracts of a large collection of publications and uses these terms to
visualize the structure of a medical field and to indicate how research areas
within this field differ from each other in their average citation impact. Results:
Visualizations are provided for 32 medical fields, defined based on journal subject
categories in the Web of Science database. The analysis focuses on three fields:
Cardiac & cardiovascular systems, Clinical neurology, and Surgery. In each of these
fields, there turn out to be large differences in citation practices between
research areas. Low-impact research areas tend to focus on clinical intervention
research, while high-impact research areas are often more oriented on basic and
diagnostic research. Conclusions: Popular bibliometric indicators, such as the h-
index and the impact factor, do not correct for differences in citation practices
between medical fields. These indicators therefore cannot be used to make accurate
between-field comparisons. More sophisticated bibliometric indicators do correct
for field differences but still fail to take into account within-field
heterogeneity in citation practices. As a consequence, the citation impact of
clinical intervention research may be substantially underestimated in comparison
with basic and diagnostic research.
A systematic empirical comparison of different approaches for normalizing citation
impact indicators. We address the question how citation-based bibliometric
indicators can best be normalized to ensure fair comparisons between publications
from different scientific fields and different years. In a systematic large-scale
empirical analysis, we compare a traditional normalization approach based on a
field classification system with three source normalization approaches. We pay
special attention to the selection of the publications included in the analysis.
Publications in national scientific journals, popular scientific magazines, and
trade magazines are not included. Unlike earlier studies, we use algorithmically
constructed classification systems to evaluate the different normalization
approaches. Our analysis shows that a source normalization approach based on the
recently introduced idea of fractional citation counting does not perform well. Two
other source normalization approaches generally outperform the classification-
system-based normalization approach that we study. Our analysis therefore offers
considerable support for the use of source-normalized bibliometric indicators. (C)
2013 Elsevier Ltd. All rights reserved.
Counting publications and citations: Is more always better?. Is more always better?
We address this question in the context of bibliometric indices that aim to assess
the scientific impact of individual researchers by counting their number of highly
cited publications. We propose a simple model in which the number of citations of a
publication depends not only on the scientific impact of the publication but also
on other 'random' factors. Our model indicates that more need not always be better.
It turns out that the most influential researchers may have a systematically lower
performance, in terms of highly cited publications, than some of their less
influential colleagues. The model also suggests an improved way of counting highly
cited publications. (C) 2013 Elsevier Ltd. All rights reserved.
Some modifications to the SNIP journal impact indicator. The SNIP (source
normalized impact per paper) indicator is an indicator of the citation impact of
scientific journals. The indicator, introduced by Henk Moed in 2010, is included in
Elsevier's Scopus database. The SNIP indicator uses a source normalized approach to
correct for differences in citation practices between scientific fields. The
strength of this approach is that it does not require a field classification system
in which the boundaries of fields are explicitly defined. In this paper, a number
of modifications that were recently made to the SNIP indicator are explained, and
the advantages of the resulting revised SNIP indicator are pointed out. It is
argued that the original SNIP indicator has some counterintuitive properties, and
it is shown mathematically that the revised SNIP indicator does not have these
properties. Empirically, the differences between the original SNIP indicator and
the revised one turn out to be relatively small, although some systematic
differences can be observed. Relations with other source normalized indicators
proposed in the literature are discussed as well. (c) 2012 Elsevier Ltd. All rights
reserved.
A new methodology for constructing a publication-level classification system of
science. Classifying journals or publications into research areas is an essential
element of many bibliometric analyses. Classification usually takes place at the
level of journals, where the Web of Science subject categories are the most popular
classification system. However, journal-level classification systems have two
important limitations: They offer only a limited amount of detail, and they have
difficulties with multidisciplinary journals. To avoid these limitations, we
introduce a new methodology for constructing classification systems at the level of
individual publications. In the proposed methodology, publications are clustered
into research areas based on citation relations. The methodology is able to deal
with very large numbers of publications. We present an application in which a
classification system is produced that includes almost 10 million publications.
Based on an extensive analysis of this classification system, we discuss the
strengths and the limitations of the proposed methodology. Important strengths are
the transparency and relative simplicity of the methodology and its fairly modest
computing and memory requirements. The main limitation of the methodology is its
exclusive reliance on direct citation relations between publications. The accuracy
of the methodology can probably be increased by also taking into account other
types of relationsfor instance, based on bibliographic coupling.
The Leiden ranking 2011/2012: Data collection, indicators, and interpretation. The
Leiden Ranking 2011/2012 is a ranking of universities based on bibliometric
indicators of publication output, citation impact, and scientific collaboration.
The ranking includes 500 major universities from 41 different countries. This paper
provides an extensive discussion of the Leiden Ranking 2011/2012. The ranking is
compared with other global university rankings, in particular the Academic Ranking
of World Universities (commonly known as the Shanghai Ranking) and the Times Higher
Education World University Rankings. The comparison focuses on the methodological
choices underlying the different rankings. Also, a detailed description is offered
of the data collection methodology of the Leiden Ranking 2011/2012 and of the
indicators used in the ranking. Various innovations in the Leiden Ranking 2011/2012
are presented. These innovations include (1) an indicator based on counting a
university's highly cited publications, (2) indicators based on fractional rather
than full counting of collaborative publications, (3) the possibility of excluding
non-English language publications, and (4) the use of stability intervals. Finally,
some comments are made on the interpretation of the ranking and a number of
limitations of the ranking are pointed out.
The Inconsistency of the h-index. The h-index is a popular bibliometric indicator
for assessing individual scientists. We criticize the h-index from a theoretical
point of view. We argue that for the purpose of measuring the overall scientific
impact of a scientist (or some other unit of analysis), the h-index behaves in a
counterintuitive way. In certain cases, the mechanism used by the h-index to
aggregate publication and citation statistics into a single number leads to
inconsistencies in the way in which scientists are ranked. Our conclusion is that
the h-index cannot be considered an appropriate indicator of a scientist's overall
scientific impact. Based on recent theoretical insights, we discuss what kind of
indicators can be used as an alternative to the h-index. We pay special attention
to the highly cited publications indicator. This indicator has a lot in common with
the h-index, but unlike the h-index it does not produce inconsistent rankings.
Universality of Citation Distributions Revisited. Radicchi, Fortunato, and
Castellano (2008) claim that, apart from a scaling factor, all fields of science
are characterized by the same citation distribution. We present a large-scale
validation study of this universality-of-citation-distributions claim. Our analysis
shows that claiming citation distributions to be universal for all fields of
science is not warranted. Although many fields indeed seem to have fairly similar
citation distributions, there are exceptions as well. We also briefly discuss the
consequences of our findings for the measurement of scientific impact using
citation-based bibliometric indicators.
Globalisation of science in kilometres. The ongoing globalisation of science has
undisputedly a major impact on how and where scientific research is being conducted
nowadays. Yet, the big picture remains blurred. It is largely unknown where this
process is heading, and at which rate. Which countries are leading or lagging? Many
of its key features are difficult if not impossible to capture in measurements and
comparative statistics. Our empirical study measures the extent and growth of
scientific globalisation in terms of physical distances between co-authoring
researchers. Our analysis, drawing on 21 million research publications across all
countries and fields of science, reveals that contemporary science has globalised
at a fairly steady rate during recent decades. The average collaboration distance
per publication has increased from 334 km in 1980 to 1553 km in 2009. Despite
significant differences in globalisation rates across countries and fields of
science, we observe a pervasive process in motion, moving towards a truly
interconnected global science system. (C) 2011 Elsevier Ltd. All rights reserved.
A recursive field-normalized bibliometric performance indicator: an application to
the field of library and information science. Two commonly used ideas in the
development of citation-based research performance indicators are the idea of
normalizing citation counts based on a field classification scheme and the idea of
recursive citation weighing (like in PageRank-inspired indicators). We combine
these two ideas in a single indicator, referred to as the recursive mean normalized
citation score indicator, and we study the validity of this indicator. Our
empirical analysis shows that the proposed indicator is highly sensitive to the
field classification scheme that is used. The indicator also has a strong tendency
to reinforce biases caused by the classification scheme. Based on these
observations, we advise against the use of indicators in which the idea of
normalization based on a field classification scheme and the idea of recursive
citation weighing are combined.
On the correlation between bibliometric indicators and peer review: reply to Opthof
and Leydesdorff. Opthof and Leydesdorff (Scientometrics, 2011) reanalyze data
reported by Van Raan (Scientometrics 67(3):491-502, 2006) and conclude that there
is no significant correlation between on the one hand average citation scores
measured using the CPP/FCSm indicator and on the other hand the quality judgment of
peers. We point out that Opthof and Leydesdorff draw their conclusions based on a
very limited amount of data. We also criticize the statistical methodology used by
Opthof and Leydesdorff. Using a larger amount of data and a more appropriate
statistical methodology, we do find a significant correlation between the CPP/FCSm
indicator and peer judgment.
Towards a new crown indicator: an empirical analysis. We present an empirical
comparison between two normalization mechanisms for citation-based indicators of
research performance. These mechanisms aim to normalize citation counts for the
field and the year in which a publication was published. One mechanism is applied
in the current so-called crown indicator of our institute. The other mechanism is
applied in the new crown indicator that our institute is currently exploring. We
find that at high aggregation levels, such as at the level of large research
institutions or at the level of countries, the differences between the two
mechanisms are very small. At lower aggregation levels, such as at the level of
research groups or at the level of journals, the differences between the two
mechanisms are somewhat larger. We pay special attention to the way in which recent
publications are handled. These publications typically have very low citation
counts and should therefore be handled with special care.
Towards a new crown indicator: Some theoretical considerations. The crown indicator
is a well-known bibliometric indicator of research performance developed by our
institute. The indicator aims to normalize citation counts for differences among
fields. We critically examine the theoretical basis of the normalization mechanism
applied in the crown indicator. We also make a comparison with an alternative
normalization mechanism. The alternative mechanism turns out to have more
satisfactory properties than the mechanism applied in the crown indicator. In
particular, the alternative mechanism has a so-called consistency property. The
mechanism applied in the crown indicator lacks this important property. As a
consequence of our findings, we are currently moving towards a new crown indicator,
which relies on the alternative normalization mechanism. (C) 2010 Elsevier Ltd. All
rights reserved.
A Comparison of Two Techniques for Bibliometric Mapping: Multidimensional Scaling
and VOS. VOS is a new mapping technique that can serve as an alternative to the
well-known technique of multidimensional scaling (MDS). We present an extensive
comparison between the use of MDS and the use of VOS for constructing bibliometric
maps. In our theoretical analysis, we show the mathematical relation between the
two techniques. In our empirical analysis, we use the techniques for constructing
maps of authors, journals, and keywords. Two commonly used approaches to
bibliometric mapping, both based on MDS, turn out to produce maps that suffer from
artifacts. Maps constructed using VOS turn out not to have this problem. We
conclude that in general maps constructed using VOS provide a more satisfactory
representation of a dataset than maps constructed using well-known MDS approaches.
A unified approach to mapping and clustering of bibliometric networks. In the
analysis of bibliometric networks, researchers often use mapping and clustering
techniques in a combined fashion. Typically, however, mapping and clustering
techniques that are used together rely on very different ideas and assumptions. We
propose a unified approach to mapping and clustering of bibliometric networks. We
show that the VOS mapping technique and a weighted and parameterized variant of
modularity-based clustering can both be derived from the same underlying principle.
We illustrate our proposed approach by producing a combined mapping and clustering
of the most frequently cited publications that appeared in the field of information
science in the period 1999-2008. (C) 2010 Elsevier Ltd. All rights reserved.
Software survey: VOSviewer, a computer program for bibliometric mapping. We present
VOSviewer, a freely available computer program that we have developed for
constructing and viewing bibliometric maps. Unlike most computer programs that are
used for bibliometric mapping, VOSviewer pays special attention to the graphical
representation of bibliometric maps. The functionality of VOSviewer is especially
useful for displaying large bibliometric maps in an easy-to-interpret way. The
paper consists of three parts. In the first part, an overview of VOSviewer's
functionality for displaying bibliometric maps is provided. In the second part, the
technical implementation of specific parts of the program is discussed. Finally, in
the third part, VOSviewer's ability to handle large maps is demonstrated by using
the program to construct and display a co-citation map of 5,000 major scientific
journals.
Rivals for the crown: Reply to Opthof and Leydesdorff. We reply to the criticism of
Opthof and Leydesdorff on the way in which our institute applies journal and field
normalizations to citation counts. We point out why we believe most of the
criticism is unjustified, but we also indicate where we think Opthof and
Leydesdorff raise a valid point. (C) 2010 Elsevier Ltd. All rights reserved.
The Relation Between Eigenfactor, Audience Factor, and Influence Weight. We present
a theoretical and empirical analysis of a number of bibliometric indicators of
journal performance. We focus on three indicators in particular: the Eigenfactor
indicator, the audience factor, and the influence weight indicator. Our main
finding is that the last two indicators can be regarded as a kind of special case
of the first indicator. We also find that the three indicators can be nicely
characterized in terms of two properties. We refer to these properties as the
property of insensitivity to field differences and the property of insensitivity to
insignificant journals. The empirical results that we present illustrate our
theoretical findings. We also show empirically that the differences between various
indicators of journal performance are quite substantial.
Automatic term identification for bibliometric mapping. A term map is a map that
visualizes the structure of a scientific field by showing the relations between
important terms in the field. The terms shown in a term map are usually selected
manually with the help of domain experts. Manual term selection has the
disadvantages of being subjective and labor-intensive. To overcome these
disadvantages, we propose a methodology for automatic term identification and we
use this methodology to select the terms to be included in a term map. To evaluate
the proposed methodology, we use it to construct a term map of the field of
operations research. The quality of the map is assessed by a number of operations
research experts. It turns out that in general the proposed methodology performs
quite well.
Some comments on Egghe's derivation of the impact factor distribution. In a recent
paper, Egghe [Egghe, L. (in press). Mathematical derivation of the impact factor
distribution. Journal of Informetrics] presents a mathematical analysis of the
rank-order distribution of journal impact factors. The analysis is based on the
central limit theorem. We criticize the empirical relevance of Egghe's analysis.
More specifically, we argue that Egghe's analysis relies on an unrealistic
assumption and we show that the analysis is not in agreement with empirical data.
(C) 2009 Elsevier Ltd. All rights reserved.
On the proper understanding of the limiting behavior of generalizations of the h-
and g-indices Reply.
How to Normalize Cooccurrence Data? An Analysis of Some Well-Known Similarity
Measures. In scientometric research, the use of cooccurrence data is very common.
In many cases, a similarity measure is employed to normalize the data. However,
there is no consensus among researchers on which similarity measure is most
appropriate for normalization purposes. In this article, we theoretically analyze
the properties of similarity measures for cooccurrence data, focusing in particular
on four well-known measures: the association strength, the cosine, the inclusion
index, and the Jaccard index. We also study the behavior of these measures
empirically. Our analysis reveals that there exist two fundamentally different
types of similarity measures, namely, set-theoretic measures and probabilistic
measures. The association strength is a probabilistic measure, while the cosine,
the inclusion index, and the Jaccard index are set-theoretic measures. Both our
theoretical and our empirical results indicate that cooccurrence data can best be
normalized using a probabilistic measure. This provides strong support for the use
of the association strength in scientometric research.
Generalizing the h- and g- indices. We introduce two new measures of the
performance of a scientist. One measure, referred to as the h(alpha)-index,
generalizes the well-known h-index or Hirsch index. The other measure, referred to
as the g(alpha)-index, generalizes the closely related g-index. We analyze
theoretically the relationship between the h(alpha)-and g(alpha)-indices on the one
hand and some simple measures of scientific performance on the other hand. We also
study the behavior of the h(alpha)-and g(alpha)-indices empirically. Some
advantages of the h(alpha)- and g(alpha)-indices over the h- and g-indices are
pointed out. (C) 2008 Elsevier Ltd. All rights reserved.
Some comments on the journal weighted impact factor proposed by Habibzadeh and
Yadollahie. In a recent paper in the Journal of Informetrics, Habibzadeh and
Yadollahie [Habibzadeh, F., & Yadollahie, M. (2008). Journal weighted impact
factor: A proposal. Journal of Informetrics, 2(2), 164 - 172] propose a journal
weighted impact factor (WIF). Unlike the ordinary impact factor, the WIF of a
journal takes into account the prestige or the influence of citing journals. In
this communication, we show that the way in which Habibzadeh and Yadollahie
calculate the WIF of a journal has some serious problems. Due to these problems, a
ranking of journals based on WIF can be misleading. We also indicate how the
problems can be solved by changing the way in which the WIF of a journal is
calculated. (C) 2008 Elsevier Ltd. All rights reserved.
Appropriate similarity measures for author co-citation analysis. We provide in this
article a number of new insights into the methodological discussion about author
co-citation analysis. We first argue that the use of the Pearson correlation for
measuring the similarity between authors' co-citation profiles is not very
satisfactory. We then discuss what kind of similarity measures may be used as an
alternative to the Pearson correlation. We consider three similarity measures in
particular. One is the well-known cosine. The other two similarity measures have
not been used before in the bibliometric literature. We show by means of an example
that the choice of an appropriate similarity measure has a high practical
relevance. Finally, we discuss the use of similarity measures for statistical
inference.
Some comments on the question whether co-occurrence data should be normalized. In a
recent article in JASIST, L. Leydesdorff and L. Vaughan (2006) asserted that raw
cocitation data should be analyzed directly, without first applying a normalization
such as the Pearson correlation. In this communication, it is argued that there is
nothing wrong with the widely adopted practice of normalizing cocitation data. One
of the arguments put forward by Leydesdorff and Vaughan turns out to depend
crucially on incorrect multidimensional scaling maps that are due to an error in
the PROXSCAL program in SPSS.

You might also like