You are on page 1of 20

September 2020

Global Research Report


Identifying Research Fronts
in the Web of Science:
From metrics to meaning
Martin Szomszor, David Pendlebury and Gordon Rogers
Author biographies
Dr Martin Szomszor is Director at David Pendlebury is Head of Gordon Rogers is a Senior Data
the Institute for Scientific Information Research Analysis at the Institute for Scientist at the Institute for Scientific
and has also held the role of Head Scientific Information. Since 1983 Information. He has worked in the
of Research Analytics at ISI. He was he has used Web of Science data to fields of bibliometrics and data analysis
named a 2015 top-50 UK Information study the structure and dynamics of for the past 10 years, supporting clients
Age data leader for his work in creating research. He worked for many years around the world in evaluating their
the REF2014 impact case studies with ISI founder Eugene Garfield. research portfolio and strategy.
database for the Higher Education With Henry Small, David developed
Funding Council for England (HEFCE). ISI’s Essential Science Indicators.

Foundational past, visionary future

About the Institute for 


Scientific Information

The Institute for Scientific Information related information and analytical


(ISI)™ at Clarivate has pioneered the content and services are built.
organization of the world’s research It disseminates that knowledge
information for more than half a century. externally through events, conferences
Today it remains committed to and publications whilst conducting
promoting integrity in research whilst primary research to sustain, extend and
improving the retrieval, interpretation improve the knowledge base. For more
and utility of scientific information. It information, please visit www.clarivate.
maintains the knowledge corpus upon com/webofsciencegroup/solutions/
which the Web of Science™ index and isi-institute-for-scientific-information/.

ISBN 978-1-9160868-8-3

Cover image: South Island Braided Glacial Rivers, bterzesphoto

2
Executive summary
Our report encourages researchers Thanks to advances in the handling
and managers to perform and vizualization of very large datasets,
deeper evaluations of research it is possible to see – and visit – the Thanks to advances
via Research Front data derived leading edge of scientific and scholarly
from the Web of Science and research through science mapping in handling and the
maps depicting the structure and of the literature. Such maps typically
dynamics of specialty areas. offer 2 or 3-D landscapes of research visualization of very
disciplines and topics, created by
Research assessment and the network of citations that link one large datasets,
policymaking frequently use publication with another and by
quantitative measures based on shared terminology. Similarity among
it possible to see –
publication and citation data as a
complement to traditional expert
documents determines proximity
in the landscape while the varying
and visit – the leading
peer review. Most in the research
community are familiar with standard
density of publications creates
structures, such as ‘mountains’ or
edge of scientific
indicators, such as citation counts, ‘islands’ of knowledge. An analyst can and scholarly
the Web of Science Journal Impact locate individuals, institutions, funders
Factor™, or the h-index. Scores and and journals within this landscape and research through
ranks have their uses but are limited evaluate organizational participation
in revealing many aspects of research in different areas, as well as changes science mapping
activity and different dimensions over time. This contributes to greater
of contributions. Fuller, more understanding of current activity of the literature.
informative types of assessment are including identification of key players
now possible – but still rarely used. and hot and emerging topics.

Bibliometrics and research assessment


Research assessment, historically and in-depth assessment becomes This means not only seeking to
looks at the research process: inputs decidedly retrospective, looking support the strongest cases put
(money but possibly other resources far back to try to evaluate forward, as judged by peer review,
as well); activity (projects); outputs investments made years before. but also selecting them from
(usually codified documents such areas with the greatest promise of
as academic papers or industrial Little can be done to remedy those innovation most likely to deliver clear
patents); and outcomes (citations to cases where research investments societal and economic benefits.
papers and, increasingly, societal and are found not to have met funder
economic benefits). The last is the expectations. Yet, with research Category Normalized Citation
least well covered in most exercises funding insufficient to meet each Impact (CNCI) is one widely used
because assessment follows too and every opportunity identified by conventional indicator. Academic
rapidly on the heels of activity for researchers, it remains important papers accumulate citations over
any clear benefits to have built up. that resources should be directed as time when they are referenced by
A consequence is that comprehensive efficiently and effectively as possible. later work that relies on the earlier.

3
It is generally inferred that those CNCI is often taken as an informative progress. Conventional indicators
works that are more frequently cited indicator for the portfolio of a country, focus on the process: in essence, they
have greater influence or academic institution or research group. There analyze the outcomes of a research
‘impact’ than uncited work. However, are potential pitfalls in interpretation project. But each project is just a stem
citations not only accumulate over and most users should already off a greater branch that represents
time – they do so at rates that are be aware of these. It will be clear, the onward progress of that field
discipline dependent and that differ however, that because it takes time of research. We need to evaluate
between types of documents. The life for citation counts to build, so it takes where we are off that main branch.
sciences have higher citation rates, on time before a CNCI index can be
average, than technical and applied calculated with any confidence. The many feedback loops between
sciences and reviews tend to have the development of a project branch,
higher citation counts than articles Conventional and relatively simple its emerging knowledge and the
of the same age. To take account of retrospective indicators are evidently progress of ideas along the main stem
these differences, the citation count not enough to satisfy responsible are captured in the cross-references
for any document is compared to the research management requirements. between newer and older publications.
global average for the same kind of There has consequently been a This is the basis of the Science Citation
document, published in the same year widespread desire to develop a Index™ developed by Eugene Garfield
and in the same field of research. The more contemporary view of research at the Institute for Scientific Information
ratio between the document count activity to help address this deficit. (ISI) who referred to this as, "an
and the global average is the CNCI. association-of-ideas index". He saw
The key task is to shift the perspective that citation links joined specific topics,
The CNCI is readily calculated for from evaluation of the research concepts and methods: "the citation is a
any one document and the average process to the evaluation of research precise, unambiguous representation of
a subject that requires no interpretation
and is immune to changes in
terminology." (1955) It is inherently
Figure 1. cross-disciplinary and connections in
Research Process and Research Progress. The citation feedback a citation network are not confined to
loops add information to our understanding of progress. one field or several but roam naturally
We look for the more influential work where multiple papers throughout a research landscape.
direct their citations and that is likely to speed progress.
Citation data, Garfield saw, provided
material to build a picture of the
Impact structure of scientific research and
sketch its terrain. Once an index linking
papers through their citations exists,
Citation Later outcomes we have the basis for determining
their intellectual relationships and, as
Derek de Solla Price (1965) noted, "The
Publication pattern of bibliographic references
indicates the nature of the scientific
Reseach process

Research Front." This pattern provides


Discovery for us a map in which we can locate
a research publication and from this
apply a time axis that shows us the
Funding
direction of intellectual travel.
We can work out where a topic is and
what direction the research around
that topic is taking. But, in Price’s
Old idea Recent idea New idea Research fronts
day, the global map of science he
Reseach progress imagined was not yet a reality.

4
What are Research Fronts?
Price established the idea that there fields or topics. All these capture the
were definable 'fronts' in research and idea that it is both feasible and desirable
he used citation patterns to find them. to identify the foci of innovation and 'Research Front' is
He described an 'immediacy factor' change. What is also inherent in this
that was reflected in the 'bunching' terminology is the notion of novelty, now a recognized
or disproportionate clustering of not only in the ideas but also in the field
citations around recent papers itself. Thus, any existing typology or term, often associated
compared to older literature. He noted, categorization may often be inadequate
"Since only a small part of the earlier and could even constrain the possibility with trends in
literature is knitted together by the of identifying such innovation.
new year’s crop of papers, we may research, growth
look upon this small part as a sort of Recent papers on Research Fronts
growing tip or epidermal layer, an often deal with visualization and areas and emerging
active research front." (Price 1963) emphasize detection of emerging
topics. Visualization links efforts to fields or topics.
The literature on Research Fronts describe research frontiers to a wider
grew steadily in the last century and body of work about the mapping
accelerated over the last two decades. of all scholarly knowledge. The key
'Research Front' is now a recognized questions are, first, how to create
term, often associated with trends in these maps and, second, how to locate
research, growth areas and emerging the critical points in such maps.

How can we map science?


Without sufficient computing power, since the same word may have distinct can be determined by comparing lists
storage and extensive data, analysis meanings in different fields. Other of citing documents in the Science
of Research Fronts using publication available metadata include reference Citation Index and counting identical
and citation data was inevitably manual lists in, or citations to documents. entries. Networks of co-cited papers
and selective. There are many ways Kessler (1963) proposed the technique can be generated for specific scientific
in which research publications might of bibliographic coupling which specialties … Clusters of co-cited
be grouped to create clusters and measures subject similarity between papers provide a new way to study
then aggregate these into domains documents based on the frequency the specialty structure of science."
and networks. The Web of Science of shared cited references.
uses journal-based categories but The idea of co-citation analysis
specifies no particular distance In 1973, ISI’s Henry Small inverted was introduced simultaneously
relationships between these. the method of Kessler: by Russian information scientist
Irena V. Marshakova-Shaikevich,
For individual publications we could "A new form of document coupling but neither she nor Small knew of
use text, such as the similarity of called co-citation is defined as the each other’s work – an instance
abstracts or shared keywords, but frequency with which two documents of what the sociologist of science
textual analysis may be cumbersome are cited together. The co-citation Robert K. Merton designated the
and a detailed lexicon is required, frequency of two scientific papers phenomenon of ‘multiple discovery’.

5
Small measured the similarity of two
documents in terms of the number of Figure 2.
times they were cited together: this is How Kessler’s citation coupling (left) differs from Small
their co-citation frequency. Analyzing and Marshakova’s co-citation analysis (right)
papers from particle physics he found
that co-citation patterns indicated
Bibliographic coupling Co-citation
'the notion of subject similarity' and
'the association or co-occurrence of Item A (citing) Item B (citing)
F
ideas.' He suggested that frequently
cited papers, reflecting key concepts, A Cited papers B
E
methods or experiments, could be
used as a starting point for a co-citation C D
analysis as an objective descriptor of
the social and intellectual structure of D C
specialty areas. Like Price’s Research
Fronts, consisting of a relatively small E Citing Papers
B A
group of recent papers tightly knit
together, so too Small found co-citation F
Item B (cited) Item A (cited)
analysis pointed to the specialty as the
natural organizational unit of research, Citing papers A and B are related because Papers A and B are associated becasuse
they cite papers C, D, E, and F they are both cited by papers C, D, E, and F
rather than traditionally defined and
larger fields. He also saw that such
organizational units could be studied
through time as they evolved. frequency fractionally, based on the Identifying a specialty through co-
length of the reference list in the citing citation analysis describes one topic
Small then worked with Belver C. papers, he adjusted for differences capturing intellectually related work
Griffith (Drexel University, Philadelphia) in the average rate of citation among that may cross familiar fields. To be
to lay the foundations for defining fields. Consequently, mathematics, even more useful as a guide to research
specialties using co-citation analysis. for example, emerged more strongly, management and future decision
Small and Griffith (1974; Griffith et al., having been under-represented by making, a speciality needs to be
1974) showed that individual Research integer counting. Small also showed located in a greater map that shows
Fronts could be measured for their that Research Fronts could be clustered recognizable major and minor areas
similarity with one another and thus for similarity at levels higher than of research. Only then can we fully
form the nucleus of a specialty. Their groupings of individual fronts. He and interpret what we have picked out.
mapping used multidimensional scaling Garfield (1985) summarized these
and similarity was plotted as proximity advances and published a global map There are now many academic
in two dimensions. Price (1979) hailed of science based on a combination of centers across the globe focusing
this as "revolutionary in its implications." data in the Science Citation Index and on science mapping, using a wide
the Social Sciences Citation Index™. variety of techniques and tools. These
Garfield turned Small and Griffith’s later developments are summarized
basic research into an information It is important to emphasize that there in Indiana University Professor Katy
product in the 1981 ISI Atlas of Science: is no one best method for clustering Börner’s (2010) Atlas of Science.
Biochemistry and Molecular Biology, research publications. The challenge in Of particular significance are
1978/80. The Atlas included 102 grouping research 'information' is that CiteSpace developed by Chaomei
Research Fronts, each including a map we have no gold standard, no absolute Chen (2006) at Drexel University and
of the core papers and their relationships test of correctness, to which we can VOSviewer developed by Nees-Jan
laid out by multidimensional scaling. refer. What we have instead is an array Van Eck and Ludo Waltman (2010)
A large, fold-out map showed all 102 of researchers' cultural perceptions, at CWTS, Leiden University.
Research Fronts plotted according to influenced by their origins, training,
their similarities. The ISI Atlas of Science experience and evolved view of their For more detailed background
did not survive but Garfield and Small own field and others. To a chemist, on science mapping the reader
continued their research in science the topical distinctions within is referred to Eugenio Petrovich’s
mapping. Small (1985) introduced an mathematics will be unclear. To a recent review (2020), as well as two
important modification for defining historian, the span of nanotechnology overviews in a recent handbook of
Research Fronts: fractional co-citation across chemistry, materials and science and technology indicators
clustering. By counting citation mathematics will be Byzantine. (Boyack and Klavans 2019, Thijs 2019).

6
The use and value of
Web of Science Research Fronts
The identification of ‘peaks’ of • Researchers • Policymakers
exceptional research within the The identification of a Research The distribution of a national
knowledge landscape provides Front may help to suggest how a portfolio in the research
important information. When those research career might be shaped. landscape will be of interest both
peaks, in the form of highly cited An author, by locating their current for international comparisons
papers, are linked in Research activity, can see how close her and for the extent to which
Fronts then further weight can work is to a Research Front. the country is engaging with
be assigned to their significance. Research Fronts, especially in
Citations are cross-bearings to the • Institutions areas related to policy priorities.
topics that are currently attracting A research manager can determine
exceptional attention, which may be the distribution of institutional output • Publishers
a breakthrough in an existing field across the knowledge landscape, The landscape location of a
or the realization of a novel, possibly filtering for recent or longer time journal’s contents can be seen
cross-disciplinary, area of research windows, and then assess the not only in the context of broad
in the shape of an emerging field. relationship of their research clusters disciplines but in relation to
to Research Fronts. She can also Research Fronts as topics of
Important management opportunities, make a comparative evaluation exceptional current interest.
which go far beyond the information with competitor institutions. Where appropriate, editorial policies
derived from research performance can be adjusted accordingly.
metrics, appear when Research • Research funders
Fronts are precisely located in By identifying the distribution of The work of national research
the knowledge network. publications arising from funded agencies in Mainland China and
projects, a research agency can see Japan confirms that recognition
whether its investments are producing of a Research Front is by itself
work located in or near Research of significant policy value by
Fronts, or perhaps redirect funding informing investment decisions and
to projects addressing such topics. pointing to new opportunities.

Chinese Academy • As Research Fronts are generated • Based on hot and emerging topics in
of Sciences (CAS) by using co-citation analysis, the report of Research Fronts, CAS
CAS use them to identify the key also developed a research leadership
players in a research specialty index to assess the research activity
Why CAS relies on Research Fronts by analyzing the core papers. of the world's major countries;
and to release the annual report of
• CAS found the specialties described • By looking at the citing papers of "Research Fronts – Active Fields,
in ESI Research Fronts are in line the CAS can not only track the latest Leading Countries" since 2017.
with the hot research directions that progress, but also can understand the
they identified from other channels. evolving direction of a certain area. • CAS used Research Fronts
to conduct analysis for
• Domain experts also confirm CAS’s analysis of key use specific research areas:
that most of the core papers of cases for Research Fronts
Research Fronts are classic research  Science Development Map of
articles in one research area. Thus, • Generated and released the Mathematics and Physics for
Research Fronts can be used as a Chinese version and English strategy research of mathematics
navigation tool for researchers to version of the annual report and physics fields at National
better understand a research area. Research Fronts since 2014. Science Foundation of China.

7
 Research Fronts analysis on – Research frontier Symposium • Studied the core papers of
Nano-research collaboration on Alzheimer's disease, each selected Research Front
with National Center for Extrasolar planets, Perovskite and applied their domain
Nanoscience and Technology. material in 2018. knowledge to rename all
selected Research Fronts.
 Progress and Development How CAS has used Research Fronts
of China’s Land Science and • Analyzed and demonstrated
Technology, for Ministry • Used keywords to search fronts to the yearly distribution of
of Land Resources. identify Research Fronts related to a each hot Research Front.
research area and conducted analysis
 Research and Technology of core papers and citing papers of • CAS developed two indicators
Development of Agricultural Research Fronts. Analytical results to select key hot and emerging
Machinery, for Ministry of were interpreted by domain experts. Research Fronts in each of the broad
Science and Technology area for further interpretation.
• Wrote the annual report
 Printing and Paper Manufacturing of Research Fronts, • Analyzed the contribution of
Industry Analysis Report, for National countries and organizations for
Pulp and Paper Research Institute. • Analysts with domain knowledge both core papers and citing papers
at CAS reviewed the Web of Science of key hot Research Fronts.
• Inspired by Research Fronts, Research Fronts and made the final
CAS conducted symposiums, selection of 10 hot Research Fronts • Interpreted the content, researched
focusing on special areas. and the emerging Research Fronts in efforts and ongoing trends in the
each of the 10 broader fields. key emerging Research Fronts.
• Research frontier Symposium
on Synthetic Biology in 2017;

Japan Science narrow the 10,000 articles that rank • Using the labels, the range of
and Technology in the top 1% by citations for Web of analysis that can be done includes
Science Essential Science Indicators™ international benchmarking,
Agency (JST) ESI field and publication year to about domestic portfolio analysis and
3,000 critical documents for analysis. the identification of key talent.
Why JST relies Research Fronts
JST’s analysis of key use • In addition to standard indicators,
• Traditional scientific publishing cases for Research Fronts e.g. the number of core papers
in peer-reviewed journals is and mean publication year, JST
expanding rapidly, partly because • JST has used Research Fronts adds original indicators such as the
major economies like China to identify critical and emerging frequency of Chinese authors and
have produced huge publication topics among the top slice of the percentage of Nature Index
output, showing rapid growth scientific literature as candidate journals among core papers.
in their science community. topics for review and funding.
• To compensate for the time-lag
• The explosive increase of scientific • Additionally, JST has measured derived from citation-based analysis,
articles makes it difficult to survey the the scientific positioning of JST also pays attention to the work
whole scientific articles in a research topics prioritised through identified by Clarivate as hot papers
field as scientists used to do. social needs analysis. (highly cited in the last few months).

• Accordingly, we need to narrow our • How JST has used Research Fronts
focus properly, avoiding human bias
by means of bibliometric analysis. • The JST team manually assigns
labels to each Research Front by
• Research Fronts, clustered by co- investigating the titles and abstracts
citations to highly cited articles, of the core (highly cited) papers.

8
Does a research domain map
depend on the mapping method?
Visualizing the location of particular familiar clusters of established subjects, One question that people generally
publications and linked groups of locate highly cited papers, trace the ask is whether a map produced by
publications within a picture of the networks that link such papers in the compression into two dimensions
research landscape enables us to leap Research Fronts – often across subject to create a more familiar landscape
ahead in our interpretation and develop domains – and then also determine to the authors (that is, a spatial
a real understanding of the progress of the proximity of, for example, our own arrangement of papers in a graph)
knowledge discovery. We can identify papers and those of our organization. is a valid and repeatable process.

Figure 3.
Comparison of the topics identified by two different categorical processes (ESI journal categories and CWTS
topical citation clustering) in a publication layout (map) determined by a third process (topic modelling)

Essential Science Indicators CWTS Leiden

1 7

6
2

5
3

1. Physics
2. Space Science
3. Geosciences
4. Mars, origin, evolution, surface, moon, dynamics, solar-system, atmosphere, model, mission
5. Plasma, turbulence, model, waves, plasmas, transport, tokamak, sun: corona, sun: magnetic fields, dynamics
6. Active galactic nuclei, evolution, digital sky survey, galaxies: evolution, galaxies: a ctive, emission, methods: numerical, star-formation, stars, methods: data analysis
7. Model, IHC, gravity, QCD, universe, models, standard model, search, general-relativity, mass

9
We can illustrate that this is indeed On this simple disc map, we have created by our initial and unrelated
the case using a sub-set of papers. then identified and color-highlighted methodology. The sources of metadata
For this example, we have drawn on the same papers according to two for a set of papers are internally
all the 19,000 papers published in different and independent categorical consistent in identifying categories
2016 in the Web of Science category systems: one is the Essential Science and topics. So, having established
for Astronomy and Astrophysics. Indicators field categories, which are the cross-categorical validity of these
We have further constrained the journal based; the other is a categorical ‘maps of science’, we can move to a
relative locations of these papers as system developed by CWTS Leiden, specific map of Research Fronts.
determined by a text analysis (in fact, which is based on direct citation links.
the similarity of terms in their titles The pictures show that the categorical
and abstracts) by mapping them clusters created by these other systems
into a disc for graphical purposes. remain entirely coherent in a landscape

How is our global map produced?


There are two steps: the Given this series of reference points That analogy is apt because these are
framework, provided by journals; (for example locations for journals), it indeed islands of knowledge in a sea
and the detail, provided by the is possible to plot the position of any of relative inactivity. The height of the
core and co-citing papers. article or collection of articles simply island peaks depends on the relative
by building a profile of the cited and numbers of journals in each location
The Institute for Scientific Information citing references and feeding it through and their intellectual proximity. We
provides a mapping framework using the UMAP projection. Articles that can look at the population of each
an analysis of journal citation data. have a very narrow scope in terms of island and then attach a ‘tribal’ label
Node2Vec (Grover & Leskovec 2016), the range of references (for example (Becher and Trowler, 2001), which in
which is a modern machine learning drawing mostly on one or a few journals) this instance is done by identifying
algorithm for network analysis, is used will be tightly packed among other the categories in ESI, or in some
to create an abstract feature vector articles of the same kind. Those that instances the Web of Science, in
for each journal based on the journal have varied reference lists (drawing which the journals are clustered.
citation profile (for example the ratio on a wider spread of journals) will
at which journals cite other journals). be pulled out from their main cluster
This compressed feature space allows towards another region of the map,
us to assign any journal to a location depending on the other material cited.
on two-dimensional coordinates using For example, papers about ‘logic’, at Research Fronts
the manifold projection algorithm the intersection between Mathematics
UMAP (McInnes & Healy 2018). This and Computer Science, bridge these are islands of
technique produces a map where two main domains on the map.
intellectually similar (in the sense of knowledge in a sea
co-citing) journals appear near to each The framework is a heat-map that
other (local proximity) while retaining looks like a chart of an archipelago: of relative inactivity
the overall progression across fields a blue ocean surrounding green
and disciplines (global locality). islands with grey and white peaks.

10
In the north-west are concentrations We can reliably and repeatably
for the tribes of Bio-medicine and draw on the research literature to
Health, which connect along the produce a meaningful topical map
western edge with the core sciences and we can produce an intuitively
and then the technology disciplines; interpretable geography of tribal
Materials Science is located as a domains. That means that we can
peak close to that of Chemistry increase our understanding of other
but with a major spur extending to analyses when we can locate critical
Engineering; Transportation is a niche publications, such as Research
area found in the landscape between Fronts, in a structure that we can see
Engineering and Business; and so on. is relevant to real research activity.

Figure 4.
The heatmap of all Research Front articles (2014 to 2019) plotted using the ISI mapping framework.
Areas of higher altitude (colored yellow, brown and then white) correspond to areas with the
highest concentration of publications. Labels locate major disciplinary areas on the map

Clinical Medicine

Neuroscience &
Public Health & Behavior
Healthcare Services

Biological Sciences
Psychiatry &
Education
Psychology
Language & Linguistics

Agricultural, Plant
& Animal Sciences
Criminology Communication
Environment &
Ecology

Chemistry Archeology Sociology


Arts & Humanities
Geosciences
Architecture, Political
Environment & Sciences
Materials Science Geography
Space Sciences
Physics Law

Business
Engineering Transportation
Economics

Computer Science

Statistics
Mathematics

11
How are Research Fronts created?
In the original conception of Small A subset of recent literature (the solutions) and increases the number of
and Griffith, a Research Front current year and prior five years) highly cited papers that are assigned
consists of a (1) group of highly cited from Essential Science Indicators to Research Fronts (from 43% to 99%).
papers that have been co-cited (ESI) is selected for analysis.
above a set threshold of similarity With clusters of highly cited papers in
strength and (2) their associated • Co-cited pairs are connected to place, we form a set of core papers for
citing papers. The precise nature others through single-link clustering, each Research Front and attach the set
of a Research Front is subject to meaning only one co-citation link is of co-citing papers, those that are more
interpretation since it includes both needed to bring a co-cited pair in recent and at the leading edge. The titles
the co-cited core papers, which association with another co-cited of the citing papers tell us about what
might be seen as foundational or as pair (for example the co-cited pair the Research Front means, but labelling
the breakthroughs that triggered A and B link to the co-cited pair C is highly subjective and can change as
further work, and the citing papers, and D if B and C are also co-cited). interpretation proceeds. We assign a
which are more recent and thus label to each Research Front by text
positioned at the leading edge. • Papers are clustered into mining the titles and abstracts of the core
Research Fronts based on and co-citing articles, searching for salient
• We build Research Fronts around their co-citation similarity. terms using the TextRank algorithm.
highly cited papers that serve
as landmarks. A co-citation Today we use a solution that allows Repeated trial and test have shown that
analysis is seeded through the much larger Research Fronts than these procedures consistently yield
selection of the 1% most cited in were previously practical and utilizes meaningful Research Fronts. There
their field and year, because more modern techniques to create have thus been significant evolutionary
the citation histories of these better clustering outcomes. We use adjustments but the general approach
publications mark them as the Leiden algorithm (Traag et al 2019) and underlying principles for the
influential and therefore as likely to cluster papers since it provides a creation of ESI Research Fronts
representatives of key concepts tuneable resolution parameter (so it is remain those first established for
in particular specialties, or fronts. possible to create more or less granular ISI by Small and Griffith (1974).

Locating the Research Fronts


The layout of core and citing papers and darker patches according to the and cut specific DNA strands that
(Figure 4) is created from the database way they cluster. On that uniformly are complementary to the CRISPR
of all Research Front papers from grey map we can then apply a color sequence. This is the basis of
2014 to 2019 and provides the basis highlight for just a single Research technology for gene editing within
for a background reference against Front and see where it has emerged. organisms and therefore of enormous
which individual Research Fronts research interest and application.
can be illuminated. This readily The Research Front on CRISPR
conveys the intellectual spread of (clustered regularly interspaced The map highlighting the CRISPR
research areas covered, and when short palindromic repeats) is our first Research Front shows us that the
tracked over time, shows how topics example. CRISPR, a term familiar main concentration is in Agricultural
wax and wane on their migration from popular research literature as Sciences, extending up into basic
between tribes. To do this we move well as academic journals, is a family Biological Sciences and with a
away from the heat-map that initially of DNA sequences found in bacteria stretch across to Environmental
showed us the islands and oceans in caused by DNA fragments from Sciences. There are also papers that
our research landscape and, instead, previous bacteriophage infections. are part of this Research Front in
denote all the documents in a uniform The enzyme Cas9 (CRISPR-associated Neurosciences and in Chemistry, as
background color that will show lighter protein 9) uses these to recognize well as papers as far distant as law.

12
This is very valuable. We have a major To further illustrate the information that clusters linked by a lighter scatter.
research topic which is a key stepping immediately comes out of a Research The largest cluster is in Architecture,
stone methodology in modern Life Front analysis mapped in this way we Environment & Geography and has
Sciences work, but it is not constrained can look at two more, possibly less some connections into Business.
to a single major discipline nor even to a familiar, examples: 2-D Materials and The smaller cluster is in a less densely
continuous network in our conventional the Global Energy System Transition. populated area between Materials
landscape. The highlighted map will These are shown in Figure 6. Science and Engineering.
make innate sense for researchers
working in this area – yet it would The 2-D Materials Research Front This is of particular interest because
not be found by hierarchical analysis is firmly located in the uplands of we are looking at a topic where the
of categorized publication data. the Physical Sciences: Chemistry, research is already attracting enough
Materials Science, and Physics. Less attention to identify it as a Research
Within a Research Front, as well intense spurs run out to Engineering Front but the form and structure
as considering the spread of and Computer Science. A distinct of the Front has not yet evolved a
development represented by the cluster is located in Agriculture and clear research identity. The shift
co-citing papers, we may want to ask Plant Sciences alerting us to emerging from fossil fuels to renewables is an
where the core papers are located. intellectual connections into that area. active and emerging area, already of
This could provide important insights significant policy interest and likely
when the co-citing papers draw on The Global Energy System Transition to continue to be a fruitful ground
previously disparate innovations. Research Front has two concentrated for future research investment.

Figure 5.
The distribution in a global domain map (Figure 4) of papers identified with the biomedical CRISPR Research Front

Clinical Medicine

Neuroscience &
Public Health & Behavior
Healthcare Services

Biological Sciences
Psychiatry &
Education
Psychology
Language & Linguistics

Agricultural, Plant
& Animal Sciences
Criminology Communication
Environment &
Ecology

Chemistry Archeology Sociology


Arts & Humanities
Geosciences
Architecture, Political
Environment & Sciences
Materials Science Geography
Space Sciences
Physics Law

Business
Engineering Transportation
Economics

Computer Science

Statistics
Mathematics

13
Figure 6.
The distribution within a global domain map (Figure 4) of papers
identified within two technological Research Fronts

2-D Materials Global Energy System Transition

Clinical Medicine Clinical Medicine


Neuroscience & Neuroscience &
Public Health & Behavior Public Health & Behavior
Healthcare Services Healthcare Services
Biological Sciences Psychiatry & Biological Sciences Psychiatry &
Education Education
Psychology Language & Linguistics Psychology Language & Linguistics

Agricultural, Plant Agricultural, Plant


& Animal Sciences & Animal Sciences
Environment & Criminology Communication Environment & Criminology Communication
Ecology Ecology
Chemistry Archeology Sociology Arts & Humanities Chemistry Archeology Sociology Arts & Humanities
Geosciences Geosciences
Architecture, Political Architecture, Political
Environment & Sciences Environment & Sciences
Materials Science Materials Science
Space Sciences Geography Space Sciences Geography
Physics Law Physics Law
Engineering Business Engineering Business
Transportation Transportation
Economics Economics

Computer Science Computer Science

Statistics Statistics
Mathematics Mathematics

Using the Research Front map

The examples above immediately This is information that comes out of just highlight a single organization’s
point to some obvious use cases. consideration of the whole Research papers to check whether it has any
The CRISPR Research Front largely Front and its topical location, or connections to the Research Front.
confirms what many in the field locations, on the global disciplinary
will already know but for the policy map. We can use other editorially For the CRISPR Research Front we
maker it will be an affirmation of curated metadata associated with have pulled out the relevant information
the pervasive importance of the the publication records to add other about two large and research-intensive
technology that it represents. layers of meaning and take our organizations: Harvard University and
The 2-D Materials Research Front questioning and interpretation further. the Chinese Academy of Sciences
points to a link between the Physical (CAS). In this instance, we have
Sciences and an area of Biological Almost every journal article carries highlighted only those publications
Sciences that will probably be the address information of its authors, for each organization that are already
less apparent to most but could which allows us to connect papers identified as being part of the CRISPR
open new opportunities. The to one or more organizations and Research Front, either as core or
Global Energy Research Front countries. That means we can co-citing papers (Figure 7). For less
tells us about an emerging topic take the topic map described by research intensive organizations we
where continued monitoring will our Research Front analysis and might start with their entire map and
provide research funders with identify which organizations are ask whether their research is close
important investment guidance. engaged in the research, or we can to a particular Research Front.

14
Figure 7.
The location within the global domain map (Figure 4) of papers from the CRISPR Research Front
(Figure 5) authored or co-authored by a leading US and a leading China research organization

Harvard University Chinese Academy of Sciences

Clinical Medicine Clinical Medicine


Neuroscience & Neuroscience &
Public Health & Behavior Public Health & Behavior
Healthcare Services Healthcare Services
Biological Sciences Psychiatry & Biological Sciences Psychiatry &
Education Education
Psychology Language & Linguistics Psychology Language & Linguistics

Agricultural, Plant Agricultural, Plant


& Animal Sciences & Animal Sciences
Environment & Criminology Communication Environment & Criminology Communication
Ecology Ecology
Chemistry Archeology Sociology Arts & Humanities Chemistry Archeology Sociology Arts & Humanities
Geosciences Geosciences
Architecture, Political Architecture, Political
Environment & Sciences Environment & Sciences
Materials Science Materials Science
Space Sciences Geography Space Sciences Geography
Physics Law Physics Law
Engineering Business Engineering Business
Transportation Transportation
Economics Economics

Computer Science Computer Science

Statistics Statistics
Mathematics Mathematics

The diagram shows us that Harvard’s well want to ask how their focus differs and potentially invest to promote that
CRISPR research is focused in the from another, competing organization. Front to tackle challenges determined
organismal Life Sciences with a long Have they missed an opportunity? through societal or policy analysis.
spur extending up into basic Biological Should they seek to collaborate?
Sciences and an interesting outlier Individual Research Fronts need not
cluster in the area of Neurosciences Research funding organizations are be considered in isolation. A different
and Behavior. The CAS map also has likely to find particular value in the kind of analysis comes out of asking
a strong cluster in organismal Life analysis of Research Fronts. It gives their about all the topics identified as
Sciences but it has a slightly different advisory bodies an excellent oversight Research Fronts in a broader research
balance of intensity in that area of the research landscape and where area such as an entire ESI category.
and it has a strong second cluster their priorities may fall within that. They For example, a national funding body
towards Environment & Ecology. may want to locate a topic of particular research in Geosciences may want to
interest in their existing portfolio and know about all the relevant Research
The detailed interpretation of then evaluate how close that work is to Fronts and their dynamics: how big;
these slightly but significantly a Research Front. They could simply how recent; how cross-disciplinary?
contrasting distributions would ask the question, ‘How much of the And then, of course, who is involved?
benefit from an expert view and work we recently funded is engaging
from proper examination of a sample with this Research Front?’ Igami The following diagram (Figure 8)
of the individual publications. and Saka (2016) report that such an displays each Research Front not by
What it immediately tells us is that analysis of Japanese research revealed locating it in the global landscape but
within a Research Front there are a decreasing diversity in national instead by centring the Research Front
multiple perspectives according to publication activity. Thus, an agency by the average year of its associated
organizational research portfolios. From could determine whether it needs to papers and by the diversity of ESI fields
this, a research manager may very tackle a Research Front in a priority area to which those papers are assigned.

15
Colors can be used to indicate where terms in the title and keywords of the Research Front analysis need not only
different ESI fields are dominant in set of papers, purely as signposting be at the level of funding programmes,
each Research Front: the majority and in the expectation that the user will although it is likely to provide very rich
here have the same color indicating re-interpret as they gain information. management support at those levels.
Geosciences. As cross-disciplinary It can be equally useful at the level of
diversity rises so the likelihood that a This diagram tells us about the the academic department or, indeed,
different color is shown increases, for topic, age, size and diversity of the for the individual researcher planning
example, green for Environment & Geoscience Research Fronts. It also their next career move. Maps can
Ecology, brown for Engineering, yellow introduces new information, because be the basis for discussions between
for Chemistry, and purple for Physics. we can see that Research Fronts tend evaluators and individual researchers
to grow larger as they grow older. Like undergoing evaluation to provide
The topics can also be given a oak trees, they start out small. This is greater depth of understanding than
provisional label. Labelling topics especially useful for picking out the simple scores. They can play a role
and categories is always challenging early signs of an emerging research in evaluation and are useful in both
because the identification of a topic area. Not all these small, nascent formative and summative assessment.
can be highly subjective, even for topics will flourish over the long term: Reporting involvement in a Research
experts. Sometimes an individual’s some will merge or be re-absorbed Front has a value in itself, in securing
recognition of a specific topic will while others will evaporate. But this an appointment or in enabling
change as they explore and reflect on certainly provides real management the next step towards tenure.
the content. In this situation we apply information for discussion about
a label drawn from the most frequent future investment targets.

Figure 8.
Research Fronts from Geosciences are plotted according to average year of publication (x axis) and
diversity of disciplines covered (y axis, Simpson index of ESI field diversity). Labels show the Research Front
id and summary text. The size of the bubble denotes the number of papers in the Research Front

0.8 1. Low-cost earthquake 


5 early warning system
11
2. Soil slope reliability analysis problems
2 8
3. Cloudy land surface 
6
temperature retrieval
4. Eletron precipitation- causing
0.6 emic waves
5. Selective f loatation separation
4 6. Tight gas s andstone reservoir
ESI Field Diversity

7. Continental arc magmatism


9
8. High secondary a erosol contribution
0.4
9. Land ice contribution
10 10. Global ocean c irculation model
11. Hyperspectral image classification
3

0.2

7
1

0.0
2017 2018 2019

Average publication year

16
Demonstrating involvement in these The researcher can focus on a A different sort of question might be
important topics is also valuable for Research Front of particular relevance to ask: where is my research being
promoting departmental research and then deconstruct it, looking used? For example, machine learning
profiles. For a head of department at the way the co-citing papers is an increasingly important application
the questions that can immediately reference the core papers in their across many areas where diverse, multi-
be addressed are those where text (methods, ideas, data?) and source databases are now available.
the departmental portfolio can be trace the origins of the core papers
highlighted and located in the global and the work on which they drew. The first step is to identify all the
map, using publication address They are in the best position to develop Research Fronts in which ‘machine
information, and then the distance an expert interpretation of how the learning’ has topical relevance,
to topics highlighted by Research field is building and developing. perhaps because it is one of the
Fronts can be evaluated. "Are we frequent keywords or via a lexicon
working in emerging areas or are Researchers can also roam widely of terms associated with machine
we isolated from these topics of across the landscape, starting from learning (Figures 9 & 10). That picture
interest?" This might fuel informed their current location and then sets out the size and recency of the
discussion on strategic directions for consider their path towards new relevant Research Fronts and the
the department or it might suggest Research Fronts (what is likely to same color coding by ESI category
where future recruitment might be happen next in my field, and what tells us how widely spread they are.
targeted to strengthen a team or to is the trajectory of innovation?) or The second picture then locates these
develop complementary competence. into wholly unexplored areas. Research Fronts on our global map.

Figure 9.
Research Fronts on Machine Learning are plotted according to average year of publication (x axis) and diversity
of disciplines covered (y axis, Simpson index of ESI field diversity). Labels show the Research Front id, summary
text, and the prominent ESI Field. The size of the bubble denotes the number of papers in the Research Front

0.9 1. Deep learning-based s peech


9 enhancement approaches (Engineering)
3
2. Deep asymmetric v ideo-based person
re-identification (Engineering)
0.8 10
3. Hierarchical deep n
 eural networks
(Neuroscience & Behaviour)
4. Protein structure prediction 
0.7 methods (Biology & Biochemistry)
8
4
5. Machine learning 
quantum phases (Physics)
ESI Field Diversity

0.6 6. Electric energy consumption 


forecasting models (Engineering)
7 7. Novel robust s tructured subspace
1 2 learning (Computer Science)
0.5
8. Predictive qsar models (Chemistry)
9. Deep learning convolutional 
neural networks (Clinical Medicine)
5
0.4
10. Hyperspectral image 
classifications (Geosciences)

0.3 6

0.2
2017 2018 2019

Average publication year

17
It is apparent from this that machine
learning is a component of a very Figure 10.
diverse spread of current Research Research Fronts on Machine Learning are placed on the global map.
Fronts. It appears in large recent Although Research Fronts contain articles sprawling across regions,
topic cluster in Clinical Medicine, we summarize them in a single position by taking the average
has another major cluster focussed coordinates of core and citing papers. This picture shows how
in Geosciences, numerous machine learning is being deployed in various settings across Clinical
developments in Engineering and Medicine, Chemistry, Physics, Engineering and Geosciences.
Computer Science, and applications
in Physics, Chemistry, Psychiatry
& Psychology, and Biology and
Biochemistry. The opportunities for
career development for the young
researcher are evidently manifold.

The opportunities for


career development
for the young
researcher are
evidently manifold.

Taking the next step


ISI encourages analysts engaged possible to identify topical, cross- and government to help them
in research assessment and disciplinary research areas and better understand where their
research policy to consider the track them as they develop and research portfolios are situated,
citation network as more than a mature in the research ecosystem. how they perform against their
tool for metrics, but as an evolving The Clarivate Professional peers and to provide intelligence
structure that reflects the changing Services group continues to for the purposes of investment
discourse of research. Through make use of Research Fronts to and strategic planning.
mapping and analysis of Research deliver custom research projects
Fronts, we demonstrate that it is to clients in academia, industry www.webofsciencegroup.com/isi

18
References
Becher, T. and Trowler, P.R. (2001). Academic Grover, A. and Leskovec, J. (2016). Node2vec. Small, H. (1973). Co-Citation in the scientific
Tribes and Territories (second edition, Proceedings of the 22nd ACM SIGKDD literature: A new measure of the relationship
pp 238). Open University Press, Milton International Conference on Knowledge between two documents. Journal of the
Keynes UK. ISBN: 978-0335206278 Discovery and Data Mining. KDD ’16: American Society for Information Science,
www.doi.org/10.1145/2939672.2939754 24, 265-269. DOI: 10.1002/asi.4630240406
Börner, K. (2010). Atlas of Science –
Visualizing What We Know. MIT Press, Igami, M. and Saka, A. (2016). Decreasing diversity Small, H. (1997). Update on science mapping:
Cambridge MA. ISBN: 978-0262014458 in Japanese science, evidence from in-depth creating large document spaces. Scientometrics,
analyses of science maps. Scientometrics, 106, 38, 275-293. DOI: 10.1007/BF02457414
Boyack, K.W. (2009). Using detailed maps of science 383-403. DOI: 10.1007/s11192-015-1648-9
to identify potential collaborations. Scientometrics, Small, H. (1999). Visualizing science by citation
79, 27-44. DOI: 10.1007/s11192-009-0402-6 Kessler, M.M. (1963). Bibliographic coupling between mapping. Journal of the American Society for
scientific papers. American Documentation, Information Science, 50, 799-813. DOI: 10.1002/
Boyack, K.W. and Klavans, R. (2010). Co-citation 14, 10-25. DOI: 10.1002/asi.5090140103 (SICI)1097-4571(1999)50:9<799::AID-ASI9>3.0.CO;2-G
analysis, bibliographic coupling and direct citation:
which citation approach represents the research Marshakova-Shaikevich, I.V. (1973). System of Small, H. (1999). A passage through science:
front most accurately? Journal of the American document connections based on references. Nauchno crossing disciplinary boundaries.
Society for Information Science and Technology, Tekhnicheskaya, Informatsiza Seriya 2, SSR, [Scientific Library Trends, 48, 72-108.
61, 2389-2404. DOI: 10.1002/asi.21419\ and Technical Information Serial of VINITI], 6, 3-8.
Small, H. (2006). Tracking and predicting growth
Boyack, K.W. and Klavans, R. (2017). Which type of McInnes, L. and Healy, J. areas in science. Scientometrics, 68, 595–610.
citation analysis generates the most accurate taxonomy (2018) UMAP: Uniform Manifold Approximation DOI: DOI: 10.1007/s11192-006-0132-y
of scientific and technical knowledge. Journal of the and Projection for Dimension Reduction,
Association of Information Science and Technology, ArXiv e-prints 1802.03426, 2018 Small, H. and Garfield, E. (1985).
68, 984-998. DOI: 10.1002/asi.23734Boyack, K.W.
The geography of science: Disciplinary
and Klavans, R. (2019). Creation and analysis of
Noyons, E.C.M., Moed, H.F. and Luwel, M. (1999). and national mappings, Journal of
large-scale bibliometric networks. In Springer
Combining mapping and citation analysis for Information Science, 11, 147-159. DOI:
Handbook of Science and Technology Indicators,
evaluative bibliometric purposes: A bibliometric 10.1177/016555158501100402
W. Glänzel, H.F. Moed, U. Schmoch, M. Thelwall
study. Journal of the American Society for Information
(eds.), Springer, 187-212. ISBN: 978-3030025106
Science, 50, 115-131. DOI: 10.1002/(SICI)1097- Small, H. and Griffith, B.C. (1974). Structure
4571(1999)50:2<115::AID-ASI3>3.3.CO;2-A of scientific literatures. I: identifying and
Chen, C. (2006). CiteSpace II: Detecting and
graphing specialties. Science Studies, 4, 17-
visualizing emerging trends and transient
Noyons, E.C.M. (2004). Science maps within 40. DOI: 10.1177/030631277400400102
patterns in scientific literature. Journal of the
a science policy context. In Handbook of
American Society for Information Science and
Quantitative Science and Technology Research, Small, H. and Sweeney, E. (1985). Clustering
Technology, 57, 359-377. DOI: 10.1002/asi.20317
H.F. Moed, W. Glänzel, U. Schmoch (eds.), the Science Citation Index using co-citations.
Chen, C. (2013). Mapping Scientific Frontiers:
Springer, 187-213. ISBN: 978-1402027024 I. A comparisons of methods. Scientometrics,
The Quest for Knowledge Visualization, second
edition, Springer. ISBN: 978-1447151272 7, 391-409. DOI: 10.1007/BF02017157
Pendlebury, D.A. (2013). Research Fronts: In
search of the structure of science. In Research Thijs, B. (2019). Science mapping and the
De Bellis, N. (2009). Maps and paradigms:
Fronts 2013: 100 Top-Ranked Specialties in the identification of topics: Theoretical and
Bibliographic citations at the service of the
Sciences and Social Sciences, C. King and D.A. methodological considerations. In Springer
history and sociology of science. In Bibliometrics
Pendlebury (eds.), Thomson Reuters, 26-31. Handbook of Science and Technology Indicators,
and Citation Analysis: From the Science
Citation Index to Cybermetrics, Scarecrow W. Glänzel, H.F. Moed, U. Schmoch, M. Thelwall
Press, 143-179. ISBN: 978-0810867130 Petrovich, E. (2020). Science mapping and (eds.), Springer, 213-233. ISBN: 978-3030025106
science maps. ISKO Encyclopedia of Knowledge
Garfield, E. (1955). Citation indexes for Organization, B. Hjørland and C. Gnoli (eds.) Traag, V. A., Waltman, L. and van Eck, N. J.
science: a new dimension in documentation www.isko.org/cyclo/science_mapping (2019). From Louvain to Leiden: guaranteeing
through association of ideas. Science, 122, well-connected communities. Scientific Reports,
108-11. DOI: 10.1126/science.122.3159.108 Price, D de S. (1965). Networks of scientific 9(1). DOI DOI: 10.1038/s41598-019-41695-z
papers. Science, 149, 510-515. DOI:
Glänzel, W. and Thijs, B. (2012). Using ‘core 10.1126/science.149.3683.510 Van Eck, N.J. and Waltman, L. (2010). Software
documents’ for detecting and labelling new survey: VOSviewer, a computer program for
emerging topics. Scientometrics, 91, 399- Price, D de S. (1979). Foreword. In E. Garfield, bibliometric mapping. Scientometrics, 84,
416. DOI: 10.1007/s11192-011-0591-7 Essays of an Information Scientist, 3, 1977-1978, 523-538. DOI: 10.1007/s11192-009-0146-3
Institute for Scientific Information, v-ix.
Griffith, B.C., Small, H.G., Stonehill, J.A. and Waltman, L., Van Eck, N.J. and Noyons, E.C.M.
Dey, S. (1974). Structure of scientific literatures. Price, D de S. (1986). Little Science, Big (2010). A unified approach to mapping and
II: toward a macrostructure and microstructure Science… and Beyond. (reprint edition of clustering of bibliometric networks.
for science. Science Studies, 4, 339-365. 1963 Little Science, Big Science). Columbia Journal of Informetrics, 4, 629-635.
DOI: 10.1177/030631277400400402 University Press. ISBN: 978-0231049566 DOI: 10.1016/j.joi.2010.07.002

19
About the Global Research Report series from
the Institute for Scientific Information (ISI)

Our Global Research Reports Previous reports include:


draw on our unique industry
insights to offer analysis, ideas Profiles not metrics
and commentary to enlighten
and stimulate debate. Navigating the structure
of research on sustainable
Each one demonstrates the huge development goals
potential of research data to inform
management issues in research Multi-authorship
assessment and research policy and research analytics
and to accelerate development
of the global research base.

If you would like to receive news


and reports from the Institute for
Scientific Information, or to find
out more about our work of ISI,
please do get in touch.

e: ISI@clarivate.com

About Clarivate
Clarivate™ is a global leader in brands, including the Web of
providing trusted information and Science™, Cortellis™, Derwent™,
insights to accelerate the pace of CompuMark™, MarkMonitor™ and
innovation. We offer subscription and Techstreet™. For more information,
technology-based solutions coupled please visit clarivate.com.
with deep domain expertise that cover
the entire lifecycle of innovation – The Web of Science™, part of Clarivate,
from foundational research and ideas organizes the world’s research
to protection and commercialization. information to enable academia,
Today, we’re setting a trail-blazing corporations, publishers and
course to help customers turn bold governments to accelerate the pace of
ideas into life-changing inventions. research. It is powered by the world’s
Our portfolio consists of some of largest publisher-neutral citation index
the world’s most trusted information and research intelligence platform.

webofsciencegroup.com/isi

© 2020 Clarivate. Clarivate and its logo, as well as all other trademarks used herein
are trademarks of their respective owners and used under license.

WS548441888 / 07

You might also like