You are on page 1of 168

INTERNATIONAL WORKSHOP ON

BIBLIOMETRIC & SCIENTOMETRIC


ANALYSIS FOR RESEARCH MANAGEMENT
Course Material

LIBRARY AND INFORMATION SCIENCE

By
Dr. R. Balasubramani

Assistant Professor

Department of Library and Information Science

BHARATHIDASAN UNIVERSITY

TIRUCHIRAPALLI – 620024

TAMIL NADU
PUBMED AND BIBEXCEL TUTORIAL

I have been using bibexcel for a past 7 years and keep recommending it
to lot of LIS Professionals. However, I keep getting questions from our
professional on how to do get started, and also have to reread my own
written notes every time I start some more analysis. So I thought this
International Workshop in srilanka is the right time to write something
more structured on how to get going with scientometrics & bibliometrics
using bibexcel.

Bibexcel is a great tool for helping with bibliometric, scientometric


analysis, and citation studies in particular. Bibliometric, Scientometric,
citation analysis, co-citation, shared references, bibliographic coupling,
cluster analysis; prepare bibliometric maps for mapping with Pajek,
NetDraw. BibExcel uses, Pubmed Records, ISI records, SCI, SSCI,
A&HCI, but BibExcel can also convert other formats

Installation is easy, just copy the files to a directory on your hard drive
and be sure to put the help files in the same directory. Read the
following pages for more help.

1
Open Pubmed database and type the desired field you want to search and then click
on search icon. For example, I have searched on the topic malaria to retrieve data about it.

2
After clicking search icon, the result of the searched item appears on the screen. For
example, there are 74,412 records on the topic malaria. You can also further limit the
records by restricting the time period. In order to limit the year, click on custom range and
select the year you want to restrict and click apply. For example I need records on malaria
for the year 2014, hence I gave the dates as 2014/01/01 to 2014/12 31 (yy/mm/dd).

3
The result page will provide only the records published in those years on that topic.
Now you can see only 4367 record were published in malaria in the year 2014.

4
In order to save results from pubmed, click on send to, a tab will appear on the
screen.Inside the tab, select File for choose destination and change the format to
MEDLINE and then click create file. The records will start downloading automatically.

5
The data will be downloaded in text format which consists of all the bibliographical
details about 4367 records published in malaria in the year 2014.

ANALYSIS USING BIBEXCEL

LANGUAGE (LA)
6
After downloading the records, go to C drive or D drive and create a folder named
Malaria and paste the records in the folder. Copy the records again and paste in the same
folder as shown in the figure and rename it as LA, Where LA indicates Language.

7
Now, open Bibexcel and inside select file here open the folder malaria that was created
in D drive. Inside the malaria folder two text files will appear; Pubmed result and LA file. Now,
select the LA file and click view file. Inside the list box the LA file can be viewed.

8
After viewing the LA file, next step is to select Misc and click on convert to dialog
format and then click on convert fron RIS format and also type la in the old tagThis will
create a document (doc) file of LA.

9
The new doc file for LA is created. In order to view it in the list box click on view
file as shown in the figure.

10
After creating the doc file of LA, next step is to select whole field intact in select field
to be analyzed, view files to get info about the fields are analyzed box and click prep. This
creates the out file of LA, which can be viewed in the list box by clicking view file. This out
file gives the result as first article is English; second article is also in English and so on.

11
After creating out file for LA, next step is to select whole string in the frequency
distribution, select type of unit box and click start. This creates the cit file of LA and it can
viewed by clicking view file. This cit file gives a clear picture on how many articles are
published English language, French language etc. For example the above figure indicates
4207 records in English language, 60 records in Chinese language, 40 records in French
language etc.

12
After creating the LA cit file, copy the result file from the list box and paste in an
excel sheet in order to analyze clearly. Rename the sheet as LA and paste the records on
column B and languages on column A, So that we can clearly notice the number of records
from each language.

PLACE (PL) OF PUBLICATION

13
After analyzing the Languages, the next analysis is the Place (PL) of Publication. In
order to analyze the Place (PL) of Publication, the first step is to open the Malaria folder in
D drive and copy the pubmed result and paste it again in the same folder. Rename the new
copy as PL which indicates the place of publication.

14
Now, open Bibexcel and inside select file here open the folder malaria that was
created in D drive. Inside the malaria folder PL text file will appear. Now, select the PL file
and click view file. Inside the list box the PL file can be viewed.

15
After viewing the PL file, next step is to select Misc and click on convert to dialog
format and then click on convert fron RIS format and also type PL in the old tag. This
will create a document (doc) file of PL.

16
The new doc file for PL is created. In order to view it in the list box click on view
file as shown in the figure.

17
After creating the doc file of PL, next step is to select whole field intact in select field
to be analyzed, view files to get info about the fields are analyzed box and click prep. This
creates the out file of PL, which can be viewed in the list box by clicking view file. This out file
gives the result as first article is from India; second article is from Nigeria and so on.

18
After creating out file for PL, next step is to select whole string in the frequency
distribution, select type of unit box and click start. This creates the cit file of PL and it can
viewed by clicking view file. This cit file gives a clear picture on how many records were
published by a single country.For example, the above figure indicates England has
contributed 1518 records,U.S.A has contributes 1480 recorsd and so on. After creating Cit
file, you can copy the result from the list box into excel for clear analysis.

AUTHORSHIP PATTERN (AU!)

19
After analyzing the Place of Publication, the next analysis is the Authorship pattern (AU!).
In order to analyze the Authorship pattern (AU!), the first step is to open the Malaria folder
in D drive and copy the pubmed result and paste it again in the same folder. Rename the
new copy as AU!, which indicates the authorship pattern.

20
Now, open Bibexcel and inside select file here open the folder malaria that was
created in D drive. Inside the malaria folder AU! text file will appear. Now, select the AU!
file and click view file. Inside the list box the AU! file can be viewed.

21
After viewing the AU! file, next step is to select Misc and click on convert to
dialog format and then click on convert fron RIS format and also type AU in the old tag.
This will create a document (doc) file of AU!.

Note: The old tag only accepts two cases; hence AU is typed instead of AU!

22
The new doc file for AU! is created. In order to view it in the list box click on
view file as shown in the figure.

23
After creating the doc file of AU!, next step is to select Any; separated field in
select field to be analyzed, view files to get info about the fields are analyzed box and click
prep. This creates the out file of AU! and this can be viewed in the list box by clicking the
view file icon.

24
After creating the out file of AU!, next step is to select analyze and click units per
record. This will create the mul file and mut file of AU!.

25
This is the mul file of AU! and this can be viewed in the list box by clicking the view file
icon. Next step is to select whole string in the frequency distribution, select type of unit
box and click start.

26
This is the mut file of AU! and this can be viewed in the list box by clicking the view file
icon.

27
After creating mul file and mut file of AU! , the next step is to copy the result of mul
file from the list bix in a separate excel sheet for finding out the authorship pattern.After
copying it in the excel sheet, select the right hand side coloumn and clik and filter sort
smallest to largest.This will arrange the numbers in ascending order, thereby making the
analysis easier.

28
After sorting and filtering the data, next step is to copy the data that was sorted on
the right column and paste it in the left column. Then, give headings for both the columns;
Authors for left and records for right. Finally, changes the digits into words i.e. replace 1
with single, 2 with double and so on. Thus, the authorship pattern for the given data using
bibexcel was analyzed.’

RANKING OF AUTHORS (AU)


29
After analyzing the Authorship pattern, the next analysis is the ranking of authors
based on publication (AU). In order to analyze the ranking of authors (AU), the first step is
to open the Malaria folder in D drive and copy the pubmed result and paste it again in the
same folder. Rename the new copy as AU, which indicates the ranking of authors based on
publication.

30
Now, open Bibexcel and inside select file here open the folder malaria that was
created in D drive. Inside the malaria folder AU text file will appear. Now, select the AU
file and click view file. Inside the list box the AU file can be viewed.

31
After viewing the AU file, next step is to select Misc and click on convert to dialog
format and then click on convert fron RIS format and also type AU in the old tag. This
will create a document (doc) file of AU.

32
The new doc file for AU is created. In order to view it in the list box click on
view file as shown in the figure.

33
After creating the doc file of AU, next step is to select Any; separated field in
select field to be analyzed, view files to get info about the fields are analyzed box and click
prep. This creates the out file of AU which can be viewed in the list box by clicking the
view file icon.

34
After creating out file for AU, next step is to select whole string in the frequency
distribution, select type of unit box and click start. This creates the cit file of AU and it can
viewed by clicking view file. This cit file gives a clear picture on how many records are
published by a single author. For example the above figure indicates White NJ has
published 33 records, Nosten F with 32 records and so on. You can copy the result of the cit
file into excel sheet for tabulation.

MESH HEADINGS (MH)

35
After analyzing the ranking of authors, the next analysis is the Mesh Headings
(MH). In order to analyze the Mesh Headings (MH), the first step is to open the Malaria
folder in D drive and copy the pubmed result and paste it again in the same folder. Rename
the new copy as MH, which indicates the Mesh Headings.

36
Now, open Bibexcel and inside select file here open the folder malaria that was
created in D drive. Inside the malaria folder MH text file will appear. Now, select the MH
file and click view file. Inside the list box the MH file can be viewed.

37
After viewing the MH file, next step is to select Misc and click on convert to dialog
format and then click on convert fron RIS format and also type MH in the old tag. This
will create a document (doc) file of MH.

38
The new doc file for MH is created. In order to view it in the list box click on
view file as shown in the figure.

39
After creating the doc file of AU, next step is to select Any; separated field in
select field to be analyzed, view files to get info about the fields are analyzed box and click
prep. This creates the out file of MH which can be viewed in the list box by clicking the
view file icon.

40
After creating out file for MH, next step is to select whole string in the frequency
distribution, select type of unit box and click start. This creates the cit file of MH and it can
viewed by clicking view file. You can copy the result of the cit file into excel sheet for
tabulation.

PUBLICATION TYPE (PT)

41
After analyzing the mesh headings, the next analysis is the Publication Type (PT). In
order to analyze the Publication Type (PT), the first step is to open the Malaria folder in D
drive and copy the pubmed result and paste it again in the same folder. Rename the new
copy as PT, which indicates the Mesh Headings.

42
Now, open Bibexcel and inside select file here open the folder malaria that was
created in D drive. Inside the malaria folder PT text file will appear. Now, select the PT file
and click view file. Inside the list box the PT file can be viewed.

43
After viewing the PT file, next step is to select Misc and click on convert to dialog
format and then click on convert fron RIS format and also type PT in the old tag. This
will create a document (doc) file of PT.

44
The new doc file for PT is created. In order to view it in the list box click on view
file as shown in the figure.

45
After creating the doc file of PT, next step is to select whole field intact in select field
to be analyzed, view files to get info about the fields are analyzed box and click prep. This
creates the out file of PT which can be viewed in the list box by clicking the view file icon.

46
After creating out file for PT, next step is to select whole string in the frequency
distribution, select type of unit box and click start. This creates the cit file of PT and it can
viewed by clicking view file. You can copy the result of the cit file into excel sheet for
tabulation.

SOURCE TITLES (SO)

47
After analyzing the Publication Type, the next analysis is the Source Titles (SO). In
order to analyze the Source Titles (SO), the first step is to open the Malaria folder in D
drive and copy the pubmed result and paste it again in the same folder. Rename the new
copy as SO, which indicates the Source Titles.

48
Now, open Bibexcel and inside select file here open the folder malaria that was
created in D drive. Inside the malaria folder SO text file will appear. Now, select the SO file
and click view file. Inside the list box the SO file can be viewed.

49
After viewing the SO file, next step is to select Misc and click on convert to dialog
format and then click on convert fron RIS format and also type SO in the old tag. This
will create a document (doc) file of SO.

50
The new doc file for SO is created. In order to view it in the list box click on view
file as shown in the figure.

51
After creating the doc file of SO, next step is to select whole field intact in select
field to be analyzed, view files to get info about the fields are analyzed box and click prep.
This creates the out file of SO which can be viewed in the list box by clicking the view file
icon.

52
After creating the SO out file, the next step is to open a new excel sheet and select
data and then click get external data from the text. This will open a new tab to get the
data. Go to the malaria folder in the D drive and click All files in the files of type. In this
folder select the So out file you have created and follow the steps.

53
The next step is to select Delimited in choose the file type that best describes
your data and click next to proceed.

54
The next step is to remove tab, semicolon, comma and space and select only the
other box and also place a dot in corresponding box. After finishing, click next to proceed.

55
The next step is to select General in the Column Data Format box and click finish to
proceed.

56
The next step is to se3lect existing worksheet in the import box and click ok to import
data.

57
After selecting the existing worksheet, the imported data appears on the screen
where you have to copy the first column of the data for analysis.

58
The next step is to create a new text document in malaria folder on D drive and paste
the copied column on the text document.

59
After pasting the copied file in the text document, rename the file as SO 1.

60
After creating the SO 1 file, open the same in Bibexcel and click view file icon to
view the file.

61
After creating SO 1 text file, the next step is to select whole string in the frequency
distribution, select type of unit box and click start. This creates the cit file of SO 1 and it
can viewed by clicking view file. You can copy the result of the cit file into excel sheet for
tabulation.

MAPPING CO-AUTHORSHIPS WITH BIBEXCEL

62
The first step is to create a new folder in C or D drive and copy the pubmed result and paste
it again in the folder and rename it as AU. The next step is to create doc file of AU and
from the Doc file create the out file of AU.

Now, select the AU out file and view it in the list box. Next, select whole string and click
sort descending and click start. This will create the cit file of AU.

63
The cit file of AU is created and you can view it the list box by clicking view file. In
scrolling down, you can see some names with spelling variants like VANRAAN AFJ, van
Raan AFJ, Van Raan AFJ etc.

64
To fix the spelling variants problem, select the out file and run Edit out files Convert
Upper Lower Case Good for author cited author in out file . This will make
a low file.

65
Select the low-file and make a new cit-file by pressing Start.The new cit-file has big
and small letters in last names, spaces and hyphens in last names have been removed and
Vanraan AFJ has now almost twice as many papers.

66
You may now go even further by only accepting one initial. Select the low-file and
run Edit out- Keep only author’s first initial. Then the 1st-file is created. Select the
1st-file and press Start.

67
To remove potential duplicate names select the 1st-file and choose Make new out-
file and Remove Duplicates and press Start. Then use the oux-file!. Then make a cit file
based on the oux file.

68
In order to make co-authorships, View the cit-file and Copy the names down to 15
papers from the list and Clear the list and Paste them back to The List. Then, select the oux-
file and run Analyze/Co-occurrence/Make pairs via list box (answer No to the first
question) , the coc file is created which has the co-occurrences.

69
The next step is to make a net-file for Pajek. Now, Select the coc-file and run
Mapping Creating net fiel for Pajek (answer No to the two questions). The net-file is
created. To create a vec-file, select the cit-file and run Mapping/Create vec-file. The vec
fle will be created.

70
The next step is to open the net file and vec file in pajek and run Draw Draw vector to
get the Pajek network.

71
The next step is to press Crtl + K in the Draw window and the kamada- kawai layout
is created.

BIBEXCEL TO VOS VIEWER

72
The first step is to create a new folder in C or D drive and copy the pubmed result and paste
it again in the folder and rename it as AB (Abstract). The next step is to create doc file of
AB and from the Doc file create the out file of AB.

73
The next step is to open vos viewer and select create and in the type of data, select create a
map based on text corpus.

74
The next step is to open the out file of AB and click OK. The tool box indicating to select
counting method appears, choose appropriate counting method and click finish.

75
The vos viewer map is created. This is the label view of the map.

76
This is the Density view of the map

77
Similarly, you can also use the net files made in bibexcel to create a network by
selecting create and click create a map based on network.

78
The next step is to select the net file of bibexcel from the folder and click next.

79
In the threshold box, select the maximum total link strength of an item to minimum
of 10, which indicates atleast 10 co-citations.

80
The network appears on the screen, indicating all the links/nodes connected to the each other.

81
You can copy the screenshot to the clip board

82
WEB OF SCIENCE AND HISTCITE

Web of Science (WoS, previously known as Web of Knowledge) is an online subscription-


based scientific citation indexing service maintained by Thomson Reuters that provides a
comprehensive citation search. It gives access to multiple databases that reference cross-
disciplinary research, which allows for in-depth exploration of specialized sub-fields
within an academic or scientific discipline

SEARCH
1. Search by Topic, Author, Group Author, Source Title, Publication Year, and Address.
Use the drop down menu for each search box to choose the area of your search. You
can limit your search by original language of ublication or document type.
2. Use the drop down menu to change the relationship between each search field
to AND, OR, or NOT.
3. Add additional fields for a more complex search.
4. Change the time frame and data limits of your search

83
Citation searching
If you have found a useful article, a cited reference search can help you find other
research that has referred to that original article. This is a very effective way to find
papers on the same or similar subject and to discover how a known idea or innovation
has been confirmed, applied, improved, extended or corrected.
1. From the Web of Science Core Collection search page, select Cited Reference
Search from the drop-down menu next to Basic Search
2. In the Cited Author box, enter the name of the cited author in the format shown
on screen
3. Enter the journal title in the Cited Work box (using the abbreviated form given in
the journal abbreviation list), and/or the date of publication in the Cited Year(s) box
4. Click on Search
5. Tick the boxes next to all the listed variations of the paper’s title (papers can be
wrongly cited by other authors) and click on

CITED REFERENCES
All cited references are searchable via the Cited Reference Search interface. References
that appear in blue serve as links to other Web of Science source records. These links are
limited by your subscription. References appearing in plain black text may be:
84
 References to books or other types of documents not indexed in Web of Science 
  References to articles outside of your subscription limits 
 Cited reference variants or works that were cited incorrectly by the source publication 

HISTCITE

HistCite is a system designed to help selectively identify the significant (most cited) papers
retrieved in topical searches of the Web of Science (SCI, SSCI and/or AHCI). Once a
marked list of papers has been created, the resulting Export file is processed by HistCite
to create tables ordered by author, year, or citation frequency as well as historiographs
which include a small percentage of the most-cited papers and their citation links.

IMPORTING DATA INTO HISTCITE

There are three principal ways of starting a HistCite analysis :

Start Application

1. Start the HistCite application by double clicking on the HistCite icon on your desktop
or select HistCite from the "Start > All Programs" menu.

2. HistCite starts up, first by opening a (black) command window, then by interfacing with
Internet Explorer. Internet Explorer provides the user interface through which the user
interacts with the HistCite program. HistCite opens to an "empty" page.

3. From the "File" menu, click on "Add File...", then click the Browse button and locate a
Web of Science export file (with .txt extension) or a HistCite file (with .hci) extension) in the
open file dialog. Once you have located the correct file, click "Open" then "Add File"

Drag and Drop

1. From Windows Explorer, locate the text file or files that you want to analyze.

2. Drag and drop the files on to the HistCite application file icon, HistCite.exe. This is
located where the installer placed it (usually C:\Program Files\HistCite). Alternatively
drop the files onto the shortcut on the Windows Desktop. It is possible to select multiple
files and drag and drop them on to the icon simultaneously.

Drag and drop the file called "HistCiteSample.txt" onto the HistCite icon. Please note that
HistCite only imports unique WoS records for analysis. For a detailed report of your
import, from the "Tools" menu click on "Logs..."

Open a HistCite Data File

1. Double-click on a previously saved HistCite file (extension .hci).

2. Once you have activated the analysis you will first see a screen showing the analytical
85
activities which HistCite is carrying out. Analysis will take from a few seconds to
several minutes, depending on the size of your file(s), the speed of your computer,
and the amount of free memory.

Note: Do not close the black command window while you are using HistCite. If you
do, HistCite will cease to function though you will still see data in the browser
window.

STEP 1

STEP 2

86
STEP 3

LIST OF ALL RECORDS

When HistCite first opens, it shows the main page where all the records in your collection
are listed.

On first starting HistCite, the All Records list is overlaid with the "HistCite Tip of the Day"
window, where useful tips and hints are displayed.

Close the "Tips" window to view the List of All Records

The main items to notice on the opening page areas follows:


87
Window Title Bar: Shows the currently open primary file.

Menu Bar: Contains all the controls and functions you will need when using HistCite.

Collection title and description: This is a title and description of the collection that you
can add, using the File ->Properties... dialog, or click on the title to edit. The default is
"Untitled Collection"

Collection Statistics: This shows the Grand Totals of local and global citation scores and
the number of citations of all the papers in the collection. Below this is the date range of
the papers in the collection.

Analyses Index: The top line of the Analyses Index shows the parameters of the
collection, and allows access to specific analyses. The sample file has 500 records written
by 756 unique authors, published in 202 journals, with 14,088 cited references, and
1,215 unique title words.

The second line of the analyses index shows a series of links to specific analyses. Click on
any of the links in the Analyses Index to see the data analyzed by that parameter. In the
default view the Analyses index is shown. If it is not showing, go to Tools -> Analyses
Index to switch it on.

Navigation controls: Immediately above and below the main table are the navigation
controls which allow you to rapidly move through the collection.

The record table: When you first open a collection in HistCite the main table shows the
records sorted by date order. The main table column headings are described below. Click
on HistCite Help 9 any of the blue or purple table headings to sort the table by that data.
The current sort parameter and all relevant sorted data is highlighted in purple. Clicking a
second time on a purple heading will reverse the sort order.

#: Table row number. This varies dynamically and depends on the sort order.
Date/Author/Journal: This column contains the full record of each paper in the
collection. Click on the Date or Author or Journal column heading to sort the records by
these parameters. In Date sort order (the default) the table displays cross-headings for
each year represented in the collection.

LCS: Local Citation Score: Number of citations to the paper from within the collection.

GCS: Global Citation Score: Number of citations to the paper from all sources, as
reported in Web of Science when the data was downloaded.

LCR: Local Cited References: Number of records in the collection that are cited by the
paper. This number is an indication of the relevance of the paper to the collection.

CR: Number of Cited References: Total number of cited references in the bibliography of
the paper.

88
Records: Each record shows the data for a member of the collection in the standard
Vancouver style. Each record is assigned a record number by HistCite. Records contain
several hot links which lead to filtered lists. Hot links in records will turn blue and
underlined when hovered over. In addition, a small "tool tip" window will appear indicating
the number of records in the filtered list and the TLCS and TGCS scores. Click on...

LCR - to show a filtered list of papers in the collection that are cited by the record.

Record number - to open the detailed record of the paper, and to edit the record. (See
Editing individual records(See 6.1)). Author name - to show a filtered list of papers written
by that author.

Words in article title - to show a filtered list of papers in the collection with that title word.

Journal title - to show a filtered list of papers published in that journal.

Publication year - to show a filtered list of papers published in that year.

LCS - to show a filtered list of papers in the collection that cite the record.

ANALYSES INDEX

The top line of the Analyses Index shows the parameters of the collection, and allows
access pages with compiled data in various categories.

 Records 

 Authors 

 Journals 

 Cited References 

 Words 

 Yearly output 

 Document Type 

 Language 

 Institution 

 Institution with Subdivision 

 Country 

89
AUTHORS

The default view is sorted in descending order of papers per author.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending.

Table Headings

#: Rank based on the current sort order.

Author: Author Name.

Recs: Number of papers (records) by the author in the collection. Click on this number to
see a filtered list of records of papers by the author.

TLCS: Total Local Citation Score = Total citations in the collection to the author.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers by the
author in the collection. (Note that this is not necessarily the total citations to an author
in Web of Science; only to those papers by the author included in the collection).
90
JOURNALS

The default view is sorted in descending order of papers per journal.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending.

Table Headings

#: Rank based on the current sort order.

Journal: Journal Title.

Recs: Number of papers (records) published in the journal in the collection. Click on
this number to see a filtered list of papers published in the journal. T

LCS: Total Local Citation Score = Total citations in the collection to the journal.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers in the
journal in the collection. (Note that this is not necessarily the total citations to the
journal in Web of Science; only to those papers in the journal included in the collection)

91
CITED REFERENCES

Records in blue are part of the collection. Records in black are not part of the collection as
they did not meet Web of Science search criteria, or they are not indexed in Web of Science.

In the HistCite Sample File provided with the program, cited reference 1 is to a book
92
(Garfield E, 1979, Citation Indexing), which is not indexed in Web of Science. Item 4 is a
paper (SMALL H, 1973, J AM SOC INFORM SCI, V24, P265) which does not include the
words "Citation Analysis" in the title. The paper is cited 46 times in this collection so it is
obviously very relevant to the topic even though it was "missed" by the search.

In the default view, the records are sorted by the number of times cited. Remember: You
can click on any column header to sort by that parameter or to change sort order from
ascending to descending.

Table Contents

#: Rank based on the current sort order.

Author / Year / Journal: Shows the reference as it appears in the bibliographies of the
records. References in blue are to papers included in the collection. Click on the reference
to go to detailed record. Click on the table headings to sort the table by Author, Year, or
Journal

+ / WoS: The "Make Record" link allows the user to turn a reference into a record in the
collection.. The "WoS" link will automatically search Web of Science for the cited reference"

Recs: Shows the number of records in which this reference is cited. Click on the number
for a filtered list of the records which cite this reference.

WORDS

This page shows an analysis of words that occur in the titles and keyword lists of the
papers in collection.

The Word list analysis has a number of user settings that are controlled by the Settings
menu.

  Title words are included. 


  Stop Wordand words of 2 characters or fewer are excluded. 
  The parts of a hyphenated word are treated as separate words. 
  Author Keywords, included in recent Web of Science records, are excluded. 
 Web of Science Keywords Plus are excluded. KeyWords Plus® are index terms
created by Thomson Reuters from significant, frequently occurring words in the
titles of an article's cited references, and are found in more recent WoS records.
Author keywords and Web of Science Keywords Plus may be included by changing
the Settings. In addition, choices can be made to split multiword terms and
hyphenated terms. Keywords may be identified by type style. The following example
shows a word list with author and WoS Keywords Plus included. Words in italics are
found in the keyword lists, but not in titles. Words in bold italics are found in both
title and keyword lists. 

93
94
YEARLY OUTPUT

The Yearly Output page shows the collection analyzed by publication year.

The default view is sorted in ascending order of year.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending.

Table Headings

#: Rank based on the current sort order.

Publication Year: List of publication years represented in the collection.

Recs: Number of papers (records) published in the year in the collection. Click on this
number to see a filtered list of papers published in the year.

95
TLCS: Total Local Citation Score = Total citations in the collection to the articles
published in that year.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers published
that year in the collection. Click on "Histogram" in the page heading for a graphic display
of publication year HistCite Help 18 distribution. Hover over any value in the table to see
the percentage of the total.

DOCUMENT TYPE

The Document Type page shows the collection analyzed by document type, as assigned in
Web of Science.

The default view is sorted in descending order of papers per document type.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending.

Table Headings
96
#: Rank based on the current sort order.

Document Type: List of document types represented in the collection.

Recs: Number of papers (records) of that type in the collection. Click on this number to
see a filtered list of papers of that document type.

TLCS: Total Local Citation Score = Total citations in the collection to articles of that
document type.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers of that
document type in the collection. Hover over any value in the table to see the percentage of
the total.

LANGUAGE

The Language page shows the collection analyzed by language, as assigned in Web of
Science.

The default view is sorted in descending order of papers per language.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending.

Table Headings

#: Rank based on the current sort order.


97
Language: List of languages represented in the collection.

Recs: Number of papers (records) in that language in the collection. Click on this number
to see a filtered list of papers in that language.

TLCS: Total Local Citation Score = Total citations in the collection to articles in that
language. T

GCS: Total Global Citation Score = Total citations in Web of Science to papers in that
language in the collection.

Hover over any value in the table to see the percentage of the total.

INSTITUTION

The Institution page shows the collection analyzed by institution, as entered in the
address field of the records in Web of Science.

The default view is sorted in descending order of frequency of institution names.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending.

Table Headings

#: Rank based on the current sort order.

Institution: List of institutions represented in the collection.

98
Recs: Number of papers (records)from that institution in the collection. Click on this
number to see a filtered list of papers from that institution .

TLCS: Total Local Citation Score = Total citations in the collection to articles from that
institution.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers from that
institution in the collection. Hover over any value in the table to see the percentage of the
total.

INSTITUTION WITH SUBDIVISION

The Institution with Subdivision page shows the collection analyzed by institution with
subdivision -- usually a department or division within the institution, as entered in the
address field of the records in Web of Science.

The default view is sorted in descending order of frequency of institution with subdivision
names.

Remember: You can click on any column header to sort by that parameter or to change
sort order from ascending to descending. Note that variations in data entry, in
abbreviations, and among journals means that there may be several variants of the same
institution and subdivision name in a collection.

Table Headings

#: Rank based on the current sort order.

99
Institution with subdivision: List of institutions with subdivisions represented in the
collection.

Recs: Number of papers (records)from that institution subdivision in the collection. Click
on this number to see a filtered list of papers from that institution subdivision. HistCite
Help 20

TLCS: Total Local Citation Score = Total citations in the collection to articles from that
institution subdivision.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers from that
institution subdivision in the collection. Hover over any value in the table to see the
percentage of the total.

COUNTRY

The Country page shows the collection analyzed by country, as entered in the address
field of the records in Web of Science.

The default view is sorted in descending order of frequency of records per country.

Remember: You can click on any column header to sort by that parameter or to
change sort order from ascending to descending.

Table Headings

#: Rank based on the current sort order. Country: List of countries represented in
the collection.

Recs: Number of papers (records) from that country in the collection. Click on this
number to see a filtered list of papers from that country.

TLCS: Total Local Citation Score = Total citations in the collection to articles from
that country.

TGCS: Total Global Citation Score = Total citations in Web of Science to papers from
that country in the collection.

Hover over any value in the table to see the percentage of the total.

100
HISTORIOGRAPH COMPLIATION

The COLORED boxes are components of the Historiograph Compliation. Clicking on a box
will describe the specific component in more detail.

101
OUTER REFERENCES or OUTER NODES

The 'Outer References' (or 'Outer Nodes' on earlier compilations) identifies references from
papers in the collection to papers or books outside the collection. SEE IMAGE BELOW.
The number to the left (LCS) shows how often this specific reference is cited. The example
shows that 135 papers cited Kirkpatrick's 1972 article in J Clin Invest Vol51 p2948.
Clicking on the number itself brings up the list of 135 Citing Papers. The WoS Hot Link
on the right of each reference will take you directly to the Cited Reference section of ISI's
Web of Science, but in order to do so you must be a subscriber and insert the access URL
in the Web of Science Location box.

102
MISSING LINKS

103
MISSING LINKS identified possibly erroneous or variant missed citations that MAY
refer to papers in the collection. This expert system makes a best guess as to which
paper was meant by the author. In the example Node number 11, Mackler's Lancet
article "Role of Soluble Lymphocyte Mediators in ...." cites Levin AS, 1970, P Natn
Acad Sci US, V67, P82 which is not found in the main collection). HistCite did find,
however, a paper that could be the exact match in Node Number 1, Levin AS, 1970,
V67, P821, which might indicate there is a pagination error.

GRAPH MAKER

The HistCite Graph Maker allows the user to create "historiographs" of the
papers in the collection. A historiograph is a chronological citation network
showing citation links between papers. The Graph Maker allows the user to
alter many parameters of the graphs, and to export and print the results.
The best way to learn Graph Maker is to use it and to experiment with
104
various settings. To get started, open Graph Maker in the Tools menu and
click "Make graph". Default settings will usually generate a graph that fits
onto one screen or one printed page.

Two types of graphs can be created, GCS and LCS. The type of the graph
determines the criteria by which the graph is created. The example below is
the LCS type graph and the discussion that follows will be in reference to
this.

The circles represent papers. The size of the circle is relative to that paper's
LCS score (or by GCS score if a GCS graph). The number inside the circle is
the node number. Clicking on the circle brings up the detailed source record of
that paper. Leaving the mouse cursor over the circle shows the abbreviated
source data identifying that paper. Normally located below the map is the list
of papers included in the graph. The "LCS >= 20" indicates that this graph
contains all those nodes (papers) which have an LCS of 20 or higher. .The
number in parenthesis beside the year indicates the number of nodes in the
main bibliography that were published in that year. An arrow

105
pointing from one node to the next, usually to an older paper, indicates the
citational relationship between papers.

RETRIEVING DATA FROM GOOGLE SCHOLAR USING PUBLISH or


PERISH SOFTWARE

What is Publish or Perish (PoP)

Google Scholar provides a simple way to broadly search for scholarly


literature. From one place, you can search across many disciplines
and sources: articles, theses, books, abstracts and court opinions,
from academic publishers, professional societies, online repositories,

106
universities and other web sites. Google Scholar helps you find
relevant work across the world of scholarly research.

 Google ScholarIndexes Scholarly Literature form Major Publishers


(Both Free and Subscription Sources) 

 Journal and Conference Papers, Theses and Dissertation, theses and
dissertations, academic books , pre-prints, abstracts, technical
reports. 

 Indexes materials based on websites from Publishers 

About Publish or Perish


Publish or Perish is a software program that retrieves and analyzes
academic citations. It uses Google Scholar and (since release 4.1)
Microsoft Academic Search to obtain the raw citations, then analyzes
these and presents the following metrics:
 Total number of papers and total number of citations 

 Average citations per paper, citations per author, papers per
author, and citations per year 

 Hirsch's h-index and related parameters 

 Egghe's g-index 

 The contemporary h-index 

 Three variations of individual h-indices 

 The average annual increase in the individual h-index 

 The age-weighted citation rate 

 An analysis of the number of authors per paper. 
The results are available on-screen and can also be copied to the
Windows clipboard (for pasting into other applications) or saved to a
variety of output formats (for future reference or further analysis). Publish
or Perish includes a detailed help file with search tips and additional
information about the citation metrics.
What Publish or Perish is for
Publish or Perish is designed to empower individual academics to present
their case for research impact to its best advantage. We would be concerned

107
if it would be used for academic staff evaluation purposes in a mechanistic
way.
When using Publish or Perish for citation analyses, we would like to suggest
the following general rule of thumb:
 If an academic shows good citation metrics, it is very likely that he
or she has made a significant impact on the field. 
However, the reverse is not necessarily true. If an academic shows
weak citation metrics, this may be caused by a lack of impact on the
field, but also by one or more of the following:
 Working in a small field (therefore generating fewer citations in total); 

 Publishing in a language other than English (LOTE - effectively
also restricting the citation field); 

 Publishing mainly (in) books. 
Although Google Scholar performs better than the Web of Science in this
respect, it is still not very good in capturing LOTE articles and citations,
or citations in books or book chapters. As a result, citation metrics in the
Social Sciences and even more so in the Humanities will always be
underestimated as in these disciplines publications in LOTE and
books/book chapters are more likely than in the Sciences.

108
Metrics
In addition to the various simple statistics (number of papers, number of
citations, and others), Publish or Perish calculates the following citation
metrics (see Citation metrics in the online help file for more details):
Hirsch's h-index
Proposed by J.E. Hirsch in his paper An index to quantify an
individual's scientific research output, arXiv:physics/0508025
v5 29 Sep 2005. It aims to provide a robust single-number metric
of an academic's impact, combining quality with quantity.
Egghe's g-index
Proposed by Leo Egghe in his paper Theory and practice of the g-
index, Scientometrics, Vol. 69, No 1 (2006), pp. 131-152. It aims to
improve on the h-index by giving more weight to highly-cited articles.
Zhang's e-index
Publish or Perish also calculates the e-index as proposed by Chun-
Ting Zhang in his paper The e-index, complementing the h-index

109
for excess citations, PLoS ONE, Vol 5, Issue 5 (May 2009), e5429. The
e-index is the (square root) of the surplus of citations in the h-set
beyond h2, i.e., beyond the theoretical minimum required to obtain a
h-index of 'h'. The aim of the e-index is to differentiate between
scientists with similar h-indices but different citation patterns.
Contemporary h-index
Proposed by Antonis Sidiropoulos, Dimitrios Katsaros, and Yannis
Manolopoulos in their paper Generalized h-index for disclosing

latent facts in citation networks, arXiv:cs.DL/0607066 v1 13 Jul


2006. It aims to improve on the h-index by giving more weight to
recent articles, thus rewarding academics who maintain a steady
level of activity.
Age-weighted citation rate (AWCR) and AW-index
The AWCR measures the average number of citations to an entire
body of work, adjusted for the age of each individual paper. It was
inspired by Bihui Jin's note The AR-index: complementing the
h-index, ISSI Newsletter, 2007, 3(1), p. 6. The Publish or Perish
implementation differs from Jin's definition in that we sum over
all papers instead of only the h-core papers.
Individual h-index (original)
The Individual h-index was proposed by Pablo D. Batista, Monica G.
Campiteli, Osame Kinouchi, and Alexandre S. Martinez in their paper
Is it possible to compare researchers with different scientific
interests?, Scientometrics, Vol 68, No. 1 (2006), pp. 179-189. It
divides the standard h-index by the average number of authors in
the articles that contribute to the h-index, in order to reduce the
effects of co-authorship.
Individual h-index (PoP variation)
Publish or Perish also implements an alternative individual h-index
called hI,norm that takes a different approach: instead of dividing the
total h-index, it first normalizes the number of citations for each paper
by dividing the number of citations by the number of authors for that

110
paper, then calculates the h-index of the normalized citation counts.
This approach is much more fine-grained than Batista et al.'s; we
believe that it more accurately accounts for any co-authorship effects
that might be present and that it is a better approximation of the per-
author impact, which is what the original h-index set out to provide.
Multi-authored h-index
A further h-like index is due to Michael Schreiber and first described
in his paper To share the fame in a fair way, hm modifies h for

multi-authored manuscripts, New Journal of Physics, Vol 10 (2008),


040201-1-8. Schreiber's method uses fractional paper counts instead of
reduced citation counts to account for shared authorship of papers,
and then determines the multi-authored hm index based on the
resulting effective rank of the papers using undiluted citation counts.
Average annual increase in the individual h-index
As of release 4.3 Publish or Perish also calculates the average annual
increase in hI,norm, called hI,annual. This average annual increase in
the individual h-index is useful for the following reasons:

 In common with the hI,norm index, it removes to a


considerable extent any discipline-specific publication and
citation patterns that otherwise distort the h-index. 

 It also reduces the effect of career length and provides a
fairer comparison between junior and senior researchers. 
The hI,annual is meant as an indicator of an individual's average
annual research impact, as opposed to the lifetime score that is
given by the h-index or hI,norm.

111
Authorwise Search

112
The above screen Author impact analysis page allows you to perform a
quick analysis of the impact of an author's publications. This page
contains the minimum parameters that are necessary to look up an
author's publications on Google Scholar. Publish or Perish uses these
parameters to perform an Advanced Scholar Search query, which is then
analyzed and converted to a number of statistics. The results are available
on-screen and can also be copied to the Windows clipboard (for pasting in
other applications) or saved to a text file (for future reference or further
analysis). See General search if you want to perform a search with more
parameters than available on the Author impact analysis page.

Journal – Wise Analysis

113
The above page Described Journal impact analysis page allows you to
perform a quick analysis of the impact of a journal's publications. This
page contains the minimum parameters that are necessary to look up the
journal's publications on Google Scholar. Publish or Perish uses these
parameters to perform an Advanced Scholar Search query, which is then
analyzed and converted to a number of statistics. The results are available
on-screen and can also be copied to the Windows clipboard (for pasting in
other applications) or saved to a text file (for future reference or further
analysis). See General search if you want to perform a search with more
parameters than available on the Journal impact analysis page.

114
The General citation search page allows you to perform an Advanced
Scholar Search query and analyse its results. This page contains all
parameters accepted by Google Scholar. Publish or Perish uses these
parameters to perform a Google Scholar query, which is then analyzed and
converted to a number of statistics. The results are available on-screen and
can also be copied to the Windows clipboard (for pasting in other
applications) or saved to a text file (for future reference or further analysis).

STATISTICAL TOOLS IN BIBLIOMETRICS

115
RELATIVE GROWTH RATE (RGR)

The Relative Growth Rate (RGR) is the increase in number of articles or pages per
unit of time. It is calculated using the following formula.

Log e 2W- Log e 1W


Relative Growth Rate (RGR) =
2T-1T

The Relative Growth Rate (RGR) is the increase in number of articles or pages per unit of
time. The mean relative growth rate (R) over the specific period of interval can be
calculated from the following equation:
1-2R = LogeW2 – Log eW1/ T2-T1
where,
1 - 2R – Mean relative growth rate over the specific perio d of interval
LogeW1 – log of initial number of articles
LogeW2 – log of final number of articles after a specific period of interval
T2-T1 – Unit difference between the initial time and the final time
aa-1 – average no. of articles
The year is taken here as the unit of time. The RGR for articles is hereby calculated.
Relative growth rate (RGR) and doubling time (DT) of publications

Year No. of Cumulative total LogeW1 LogeW2 RGR DT


publications
2003 5789 5789 8.66
2004 9587 15376 8.66 9.64 0.98 0.71
2005 11358 26734 9.64 10.19 0.55 1.25
2006 12652 39386 10.19 10.58 0.39 1.79
2007 13920 53306 10.58 10.88 0.30 2.29
2008 17515 70821 10.88 11.17 0.28 2.44
2009 16966 87787 11.17 11.38 0.21 3.23
2010 18253 106040 11.38 11.57 0.19 3.67
2011 20318 126358 11.57 11.75 0.18 3.95
2012 18209 144567 11.75 11.88 0.13 5.15

Therefore,
1 - 2R (aa-1 year-1) can represent the mean RGR per unit of articles per unit of year over a
specific period of interval.
2004⇒= Loge15376 - Loge5789/2004 - 2003

116
= 9.64 - 8.66/1 = 0.98
2005⇒= Loge26734 - Loge15376/2005 - 2004 = 10.19 - 9.64/ 1 = 0.55

It has been observed from Table 2 and Fig. 1, that the RGR has decreased from 2003 (0.98)
to 2012.

DOUBLING TIME

Generally, doubling time is the period of time required for a quantity to double in
size or value Doubling time and Relative Growth Rate has a direct relation between them. If
the number of articles or pages of a subject doubles during a given period then the
difference between the logarithms of numbers at the beginning and end of this period must
be logarithm of the number 2. If natural logarithm is used this difference has a value of
0.693. Thus the corresponding doubling time for each specific period of interval and for
both articles and pages can be calculated by the formula:
Doubling Time (DT) = 0.693/R
Doubling Time (DT) = 0.693/R
Therefore,
Doubling time for articles
Dt (a) = 0.693/1-2R (aa-1 year-1)
2004⇒0.693/0.98 = 0.71 2005⇒0.693/0.55
= 1.26

DEGREE OF COLLABORATION

The degree of collaboration is defined as the ratio of the number of collaborative


research papers to the total number of research papers in the discipline during a certain
period of time. The formula suggested by Subramanyam (1983) is used.

It is expressed as

Nm
C=
Nm + Ns

117
Where, C is the degree of collaboration in a discipline. Nm is the number of multi-
authored research papers in the discipline published during a year.

Ns is the number of single authored papers in the discipline published during the
same year. Using this formula, the degree of collaboration is determine.

S.No Year Single Collaborative Degree of

Author Authorship Collaboration

1 1979 - 3 1.00

2 1980 1 30 0.97

3 1981 3 80 0.96

4 1982 3 112 0.97

5 1983 1 78 0.99

6 1984 1 115 0.99

7 1985 - 71 1.00

8 1986 2 198 0.99

9 1987 2 196 0.99

10 1988 5 145 0.97

11 1989 7 139 0.95

12 1990 3 237 0.99

13 1991 4 214 0.98

14 1992 3 231 0.99

15 1993 7 300 0.98

16 1994 4 340 0.99

17 1995 5 364 0.99

18 1996 8 572 0.99

118
19 1997 12 423 0.97

20 1998 10 573 0.98

21 1999 2 616 1.00

22 2000 11 703 0.98

23 2001 15 642 0.98

24 2002 8 940 0.99

25 2003 25 982 0.98

26 2004 17 1179 0.99

27 2005 20 1378 0.99

28 2006 12 1758 0.99

29 2007 17 1723 0.99

30 2008 12 2138 0.99

31 2009 12 2060 0.99

32 2010 7 2253 1.00

33 2011 23 2569 0.99

34 2012 18 2964 0.99

35 2013 12 1606 0.99

Total 292 27932 0.99

COLLABORATIVE INDEX

Collaborative Index can be obtained by total number of authors divided by total number of
published articles.

Total number of Articles


CI =
Total number of Authors
119
Where, CI = Number of authors per paper.

CO-AUTHORSHIP INDEX (CAI)

Based on the suggestions made by Garg and Padhi, the Co-Authorship pattern and Co-
Authorship Index (CAI) has been calculated by using the following formula.

CAI = {(Nij/ Nio) / (Noj/ Noo)} X 100

Where,
Nij = Number of publications for the particular authorship pattern for a particular
country. Nio = Total output for the particular authorship pattern.
Noj = Total output of the particular country.
Noo = Total output of all the countries.

PURPOSE

CAI is used to find out the preference of publication.(Individual, Small teams, Big teams etc).

COUNTR SINGLE CAI TWO CA THREE CA > THREE CA TOTAL


Y AUTHO AUTHOR I AUTHOR I AUTHOR I
R S S S
BRAZIL 17 106 44 108 45 101 41 90 147
RUSSIA 0 0 6 136 3 62 7 140 16
INDIA 17 174 32 130 23 85 17 61 89
CHINA 41 87 107 90 136 104 148 110 432
TOTAL 75 189 207 213 684

For example, in Calulating CAI value for Single Authors in Brazil

 Nij = 17 

 Nio = 75 

 Noj = 147 

 Noo = 684 

By substituting the values in the formula, we get the CAI value for Single Authors in Brazil
120
CAI = {(17/ 75) / (147/ 684)} X 100

RELATIVE CITATION IMPACT (RCI)

In this study, the impact of the knowledge management publications were calculated
by two indicators namely, Citation per publication (CPP) and Relative Citation Impact
(RCI).CPP is calculated by the average number of citations per publications. It
demonstrates on average how many times each of your paper is cited. Higher CPP number
indicates higher quality of research.RCI was developed by Thomson Reuters to calculate
the citation impact in the field of science and Engineering.

A country’s share of total citations


RCI =
A country’s share of total publications

RCI = 1 indicates denotes a country’s citation rate equal to world citation rate.

RCI < 1 indicates a country’s citation rate less than world citation rate and also implies that
the research efforts are higher than its impact.

RCI > 1 indicates a country’s citation rate higher than world citation rate and also
implies high impact research in that country.

COUNTRY TP TC CPP RCI

BRAZIL 147 484 3.29 0.42

RUSSIA 16 166 10.37 1.33

INDIA 89 449 5.04 0.64

CHINA 432 4214 9.75 1.25

TOTAL 684 5313

For example, RCI for Brazil is calculated as:


121
Brazil’s share of total citations is 484/5313*100 = 9.10

Brazil’s share of total publications is 147/684*100 = 21.49

By dividing these two values we get the RCI value for Brazil 9.10/21.49 = 0.42

TRANSFORMATIVE ACTIVITY INDEX

In order to study the change in output of lung cancer articles among the countries, use of
Transformative Activity Index (TAI) suggested by Guan and Ma has been made.
Mathematically, TAI is calculated using the following formula:

TAI = {(Ci/ Co) / (Wi/ Wo)} X 100

Where,

Ci – Number of publications of the specific country in the ith block;

C0 - Total number of publication of the specific country during the period of study;

Wi – Number of publications all the countries in the ith block;

W0 - Total number of publication of all the counties during the period of study

Country 2003- TAI 2008- TAI 2003- Change in


2007 2012 2012 TAI
Brazil 237 95 402 103 639 +9
BRICS

Russia 223 123 241 85 464 -37


India 384 65 1135 123 1519 +58
China 2333 61 7426 125 9759 +64
US 11417 107 15958 96 27375 -11

UK 2271 106 3188 96 5459 -10


France 1831 108 2501 95 4332 -13
G
7

Germany 2346 111 3067 93 5413 -18


Italy 2122 107 2964 96 5086 -11
Canada 1151 96 1925 103 3076 +7
Japan 4580 110 6086 94 10666 -16
Total 28895 44893 73788

122
For example, TAI for Brazil (2003-2007)is calculated as:

  Ci = 237 
 C0 = 639 

 Wi = 28895 

 W0 = 73788 

By substituting the values in the formula, we get the TAI value for Brazil

TAI = {(237/ 639) / (28895/ 73788)} X 100

BRADFORD’S LAW OF SCATTERING JOURNALS


The aim of Bradford’s law is to explain that a group of journals could be
arranged in the order of decreasing productivity and showed those journals which yield
most productive articles rank first and the most unproductive tail in the end. According to
this law the journals are to be grouped into a number of zones each producing a similar
number of articles. However the number of journals in each zone will increase rapidly.
2
Then the relationship between the zones is 1: a : n . According to Bradford’s distribution,
2
the relationship between the zones is 1:a:n . Contrastingly the relationship in each of the
present study is :1979 which does not fit into Bradford’s distribution.

S. No. No. of Journals No. of Articles Total No. of Cum. No of


articles articles
1 1 237 237 237
2 1 128 128 365
3 1 118 118 483
4 1 103 103 586
5 1 92 92 678
6 1 87 87 765
7 1 63 63 828
8 1 59 59 887
9 1 56 56 943
10 1 54 54 997
11 1 50 50 1047
12 1 47 47 1094
13 1 44 44 1138

123
14 1 43 43 1181
15 3 41 123 1304
16 1 40 40 1344
17 1 39 39 1383
18 3 38 114 1497
19 2 36 72 1569
20 2 35 70 1639
21 3 34 102 1741
22 1 33 33 1774
23 1 32 32 1806
24 3 31 93 1899
25 2 30 60 1959
26 2 29 58 2017
27 1 28 28 2045
28 2 27 54 2099
29 5 26 130 2229
30 2 25 50 2279
31 3 24 72 2351
32 5 23 115 2466
33 5 22 110 2576
34 5 (66) 21 105 2681 (2694)
35 4 20 80 2761
36 6 19 114 2875
37 6 18 108 2983
38 8 17 136 3119
39 3 16 48 3167
40 3 15 45 3212
41 10 14 140 3352
42 9 13 117 3469
43 10 12 120 3589
44 12 11 132 3721
45 12 10 120 3841
46 2 9 18 3859
47 33 9 297 4156
48 30 8 240 4396
49 27 7 189 4585
50 43 6 258 4843
51 86 (304) 5 430 5237
124
52 114 4 456 5729
53 170 3 510 6239
54 356 2 712 6951
55 1133(1773) 1 1133 8084
2143 8084

Zone Journals Number of Records Multiplier Factor

Zone 1 (66) 3.07% (2681)


4.60
Zone 2 (304) 14.18% (2543)
(1773) 5.83
Zone 3 (2847)
(82.73%)
10.43
Total 2143 8084

LOTKA’S LAW OF AUTHOR PRODUCTIVITY

Lotka’s law is one of the three major laws of bibliometrics that mainly explains the

distribution of literature of various authors’ productivity in a given field (Lotka 1926). It finds

that most articles are being contributed by a few researchers, with a large proportion of

researchers contributing to just one publication. Therefore, Lotka summarizes the logarithmic

relation between researchers and publication quantities. It states that “the number of authors

making n contribution is about 1/n² of those making one publication and the proportion of all

contributors that make a single contribution is about 60 percent” (Lotka 1926), as cited by

Potter (1988). The general formula is XY = C , where X is the number of publications, Y is the

relative frequency of authors with X publications, and n and C are constants, depending on the

specific field. In brief, the author who publishes two articles accounts on average for 1/4 of the

total number of publications. The authors who publish three articles account for

th
about 1/9 of the total number of publications and so on. Therefore, authors who publish one
125
article account for 60% of all the publications. That is to say, authors who publish n
2
publications will be 1 / n of the proportion of total publications. This formula is also called

the Inverse Square Law (Tsay 2003).

Table 23: Showing Lotka’s Law of Author Productivity

No. of No. of
Y ∑X = log x ∑Y = log y ∑X*Y ∑X*X
contribution X contributors
1 10645 10645 0 9.272 0 0
2 3240 6480 0.693 8.776 6.08 0.48
3 2411 7233 1.0986 8.886 9.76 1.21
4 1339 5356 1.386 8.585 11.90 1.92
5 654 3270 1.609 8.092 13.02 2.59
6 405 2430 1.791 7.795 13.96 3.21
7 312 2184 1.945 7.688 14.95 3.78
8 237 1896 2.079 7.547 15.69 4.32
9 174 1566 2.197 7.356 16.16 4.83
10 158 1580 2.302 7.365 16.95 5.30
11 116 1276 2.397 7.151 17.14 5.75
12 105 1260 2.484 7.138 17.73 6.17
13 88 1144 2.564 7.042 18.06 6.57
14 70 980 2.639 6.887 18.17 6.96
15 57 855 2.708 6.751 18.28 7.33
16 53 848 2.772 6.742 18.69 7.68
17 46 782 2.833 6.661 18.87 8.03
18 38 684 2.89 6.527 18.86 8.35
19 31 589 2.944 6.378 18.78 8.67
20 32 640 2.995 6.461 19.35 8.97
21 28 588 3.044 6.376 19.41 9.27
22 23 506 3.091 6.226 19.24 9.55
23 26 598 3.135 6.393 20.04 9.83
24 25 600 3.178 6.396 20.33 10.10
25 25 625 3.218 6.437 20.71 10.36
126
26 16 416 3.258 6.03 19.65 10.61
27 10 270 3.295 5.598 18.45 10.86
28 15 420 3.332 6.04 20.13 11.10
29 10 290 3.367 5.669 19.09 11.34
30 11 330 3.401 5.799 19.72 11.57
31 11 341 3.433 5.831 20.02 11.79
32 14 448 3.465 6.104 21.15 12.01
33 12 396 3.496 5.981 20.91 12.22
34 5 170 3.526 5.135 18.11 12.43
35 11 385 3.555 5.953 21.16 12.64
36 9 324 3.583 5.78 20.71 12.84
37 10 370 3.61 5.913 21.35 13.03
38 3 114 3.637 4.736 17.22 13.23
39 6 234 3.663 5.455 19.98 13.42
40 7 280 3.688 5.634 20.78 13.60
41 4 164 3.713 5.099 18.93 13.79
42 8 336 3.737 5.817 21.74 13.97
43 7 301 3.761 5.707 21.46 14.15
44 2 88 3.784 4.477 16.94 14.32
45 4 180 3.806 5.192 19.76 14.49
46 8 368 3.828 5.908 22.62 14.65
47 1 47 3.85 3.85 14.82 14.82
48 4 192 3.871 5.257 20.35 14.98
49 7 343 3.891 5.837 22.71 15.14
50 1 50 3.912 3.912 15.30 15.30
51 4 204 3.931 5.318 20.91 15.45
52 2 104 3.951 4.644 18.35 15.61
53 1 53 3.97 3.97 15.76 15.76
54 3 162 3.988 5.087 20.29 15.90
55 3 165 4.007 5.105 20.46 16.06
56 3 168 4.025 5.123 20.62 16.20
57 1 57 4.043 4.043 16.35 16.35
58 2 116 4.06 4.753 19.30 16.48

127
59 2 118 4.077 4.77 19.45 16.62
61 2 122 4.11 4.804 19.74 16.89
62 2 124 4.127 4.82 19.89 17.03
63 3 189 4.143 5.241 21.71 17.16
64 4 256 4.158 5.545 23.06 17.29
65 1 65 4.174 4.174 17.42 17.42
66 1 66 4.189 4.189 17.55 17.55
67 3 201 4.204 5.303 22.29 17.67
68 1 68 4.219 4.219 17.80 17.80
69 4 276 4.234 5.62 23.80 17.93
70 3 210 4.248 5.347 22.71 18.05
71 1 71 4.262 4.262 18.16 18.16
73 3 219 4.29 5.389 23.12 18.40
75 1 75 4.317 4.317 18.64 18.64
76 2 152 4.33 5.023 21.75 18.75
77 4 308 4.343 5.73 24.89 18.86
78 1 78 4.356 4.356 18.97 18.97
79 1 79 4.369 4.369 19.09 19.09
80 1 80 4.382 4.382 19.20 19.20
81 1 81 4.394 4.394 19.31 19.31
82 1 82 4.406 4.406 19.41 19.41
83 2 166 4.418 5.111 22.58 19.52
84 2 168 4.43 5.123 22.69 19.62
85 3 255 4.442 5.541 24.61 19.73
86 1 86 4.454 4.454 19.84 19.84
88 1 88 4.477 4.477 20.04 20.04
90 1 90 4.499 4.499 20.24 20.24
91 1 91 4.51 4.51 20.34 20.34
95 1 95 4.553 4.553 20.73 20.73
96 1 96 4.564 4.564 20.83 20.83
100 2 200 4.605 5.298 24.40 21.21
103 1 103 4.634 4.634 21.47 21.47
104 2 208 4.644 5.337 24.79 21.57

128
106 1 106 4.663 4.663 21.74 21.74
107 1 107 4.672 4.672 21.83 21.83
108 1 108 4.682 4.682 21.92 21.92
109 1 109 4.691 4.691 22.01 22.01
114 1 114 4.736 4.736 22.43 22.43
115 1 115 4.744 4.744 22.51 22.51
117 1 117 4.762 4.762 22.68 22.68
118 2 236 4.77 5.463 26.06 22.75
125 1 125 4.828 4.828 23.31 23.31
126 1 126 4.836 4.836 23.39 23.39
131 1 131 4.875 4.875 23.77 23.77
133 1 133 4.89 4.89 23.91 23.91
143 1 143 4.962 4.962 24.62 24.62
149 1 149 5.003 5.003 25.03 25.03
150 1 150 5.01 5.01 25.10 25.10
153 1 153 5.03 5.03 25.30 25.30
159 1 159 5.068 5.068 25.68 25.68
174 1 174 5.159 5.159 26.62 26.62
175 1 175 5.164 5.164 26.67 26.67
192 1 192 5.257 5.257 27.64 27.64
194 1 194 5.267 5.267 27.74 27.74
234 1 234 5.455 5.455 29.76 29.76
466 1 466 6.144 6.144 37.75 37.75
7900 20637 70783 438.6596 629.807 2315.204 1806.824

P = number of x items in table= 114

N = maximum number of contributors= 70783

N: Observed value

129
Pao (1989) proposed the way to calculate n-value and c- value of Lotka’s law as
in (1) and (2)

n = 114 (2315.20) – (438.66) (629.80) / 114 (438.66 ) – (438.66) (438.66)

n = - 5.6730

N is the maximum contribution of an author. X is log(x) and Y is log(y) where y are

the authors who have x number of contribution.

C = 0.38266

Where p is the number of publication groups which authors contribute same amount

of publications. Besides, Pao also used Kolmogorov– Smirnov (K–S) test to verify if

Lotka’s law is matched or not under the condition that p-value is greater than thirty five [8].

130
Verify K-S statistic value to see if Lotka’s law be capable of hold for Kerala state

Institutions related Publications. For N value is greater than 35, therefore, K-S statistics

method can be used to verify if Lotka’s law could hold for the sample area publications.

K-S = 0.006 for N = 70783 (266.05)

Totally 70783 authors contributed from the Kerala state institutions research output.

1682 (8.15 %) of authors were contributed at non collaborative for this research product.

1726.41 authors were calculate the mean value of every year authors contribution and 3.42

number of authors were calculated at individual articles. It emphasizes the fact that the

more number of publications by a researcher in any field requires a high degree of

inquisitiveness, competency, efficiency, insistence, and exposure to literatures. That is why

majority of authors have contributed to more number of papers. Further, the nature of the

institutions in which the researchers work the research area of specialization and the

availability of infrastructure facilities influence the author’s productivity.

Table 23 indicates the application of Lotka's Law with respect to author productivity

of Kerala state Institutions research output. It is seen clearly from the table among the

proportion of all contributions made single contribution 1682 (8.15 %) supremacy high. Further,

Lotka's Chi square model confirms the source trend. It explains the fact that the tabulated value

shows that observed authors’ value is higher than the expected value. Thus the present analysis

clearly invalidates Lotka's findings. In the present analysis, productivity

131
is attributed to several factors. If a complete publication detail of an author is taken, Lotka's

Law testing may present a different picture. This analysis proves the eighth (The scientific

productivity of authors research conforms to Lotka’s (n - value) inverse square law of

scientific productivity) hypothesis.

Price’s Square Root Law

In order to validate whether the distribution status of authors fulfill Price’s

Square root law, the following calculation is resorted to:

PSQ = √N = 266.05

N = 70783

Based on Price’s square root law, the just 268 contributors has 7522 times

contributing were given 15482 (75.98 %) publications, the square root value located at just

23.99 percent of publications. The value is surpassing of from 50 % (half of the literature on

a subject); so this result is more compliance with Price’s Square Root Law. The below table

4.32 has shown the related result values to highlighted.

132
Table 24: Application of Price’s Square Root Law for research Productivity

No. of No. of
% Cumulated
contribution contributors % of 18695 A*B
of A * B value of A*B
A B
466 1 0.01 466 0.72 0.72
234 1 0.01 234 0.36 1.08
194 1 0.01 194 0.30 1.38
192 1 0.01 192 0.30 1.68
175 1 0.01 175 0.27 1.95
174 1 0.01 174 0.27 2.22
159 1 0.01 159 0.25 2.47
153 1 0.01 153 0.24 2.70
150 1 0.01 150 0.23 2.94
149 1 0.01 149 0.23 3.17
143 1 0.01 143 0.22 3.39
133 1 0.01 133 0.21 3.60
131 1 0.01 131 0.20 3.80
126 1 0.01 126 0.20 3.99
125 1 0.01 125 0.19 4.19
118 2 0.01 236 0.37 4.55
117 1 0.01 117 0.18 4.73
115 1 0.01 115 0.18 4.91
114 1 0.01 114 0.18 5.09
109 1 0.01 109 0.17 5.26
108 1 0.01 108 0.17 5.43
107 1 0.01 107 0.17 5.59
106 1 0.01 106 0.16 5.76
104 2 0.01 208 0.32 6.08
103 1 0.01 103 0.16 6.24
100 2 0.01 200 0.31 6.55
96 1 0.01 96 0.15 6.70
95 1 0.01 95 0.15 6.84
91 1 0.01 91 0.14 6.98
90 1 0.01 90 0.14 7.12
88 1 0.01 88 0.14 7.26
86 1 0.01 86 0.13 7.39
85 3 0.02 255 0.40 7.79
84 2 0.01 168 0.26 8.05
83 2 0.01 166 0.26 8.31
82 1 0.01 82 0.13 8.43

133
81 1 0.01 81 0.13 8.56
80 1 0.01 80 0.12 8.68
79 1 0.01 79 0.12 8.81
78 1 0.01 78 0.12 8.93
77 4 0.02 308 0.48 9.40
76 2 0.01 152 0.24 9.64
75 1 0.01 75 0.12 9.76
73 3 0.02 219 0.34 10.09
71 1 0.01 71 0.11 10.20
70 3 0.02 210 0.33 10.53
69 4 0.02 276 0.43 10.96
68 1 0.01 68 0.11 11.06
67 3 0.02 201 0.31 11.37
66 1 0.01 66 0.10 11.48
65 1 0.01 65 0.10 11.58
64 4 0.02 256 0.40 11.97
63 3 0.02 189 0.29 12.27
62 2 0.01 124 0.19 12.46
61 2 0.01 122 0.19 12.65
59 2 0.01 118 0.18 12.83
58 2 0.01 116 0.18 13.01
57 1 0.01 57 0.09 13.10
56 3 0.02 168 0.26 13.36
55 3 0.02 165 0.26 13.61
54 3 0.02 162 0.25 13.87
53 1 0.01 53 0.08 13.95
52 2 0.01 104 0.16 14.11
51 4 0.02 204 0.32 14.43
50 1 0.01 50 0.08 14.50
49 7 0.04 343 0.53 15.03
48 4 0.02 192 0.30 15.33
47 1 0.01 47 0.07 15.40
46 8 0.04 368 0.57 15.97
45 4 0.02 180 0.28 16.25
44 2 0.01 88 0.14 16.39
43 7 0.04 301 0.47 16.86
42 8 0.04 336 0.52 17.38
41 4 0.02 164 0.25 17.63
40 7 0.04 280 0.43 18.07
39 6 0.03 234 0.36 18.43
38 3 0.02 114 0.18 18.60

134
37 10 0.05 370 0.57 19.18
36 9 0.05 324 0.50 19.68
35 11 0.06 385 0.60 20.28
34 5 0.03 170 0.26 20.54
33 12 0.06 396 0.61 21.15
32 14 0.07 448 0.69 21.85
31 (7435) 11 (237) 0.06 341(14442) 0.53 22.38
30 11 0.06 330 0.51 22.89
29 10 0.05 290 0.45 23.34
28(7522) 15 (268) 0.08 420(15482) 0.65(23.99) 23.99
27 10 0.05 270 0.42 24.41
26 16 0.09 416 0.64 25.05
25 25 0.13 625 0.97 26.02
24 25 0.13 600 0.93 26.95
23 26 0.13 598 0.89 27.84
22 23 0.12 506 0.78 28.62
21 28 0.15 588 0.91 29.53
20 32 0.17 640 0.99 30.53
19 31 0.17 589 0.91 31.44
18 38 0.20 684 1.06 32.50
17 46 0.25 782 1.21 33.71
16 53 0.28 848 1.31 35.02
15 57 0.30 855 1.32 36.35
14 70 0.37 980 1.52 37.87
13 88 0.47 1144 1.77 39.64
12 105 0.56 1260 1.95 41.59
11 116 0.62 1276 1.98 43.57
10 158 0.85 1580 2.45 46.02
9 174 0.93 1566 2.43 48.44
8 237 1.27 1896 2.94 51.38
7 312 1.67 2184 3.38 54.77
6 405 2.17 2430 3.77 58.53
5 654 2.96 3270 4.29 62.82
4 1339 4.33 5356 5.02 67.84
3 2411 7.64 7233 6.64 74.48
2 3240 15.57 6480 9.02 83.50
1 10645 56.94 10645 16.49 100.00
7900 20637 70783

135
4.20 Pareto Principle (80 X 20 Rule)

We used for this analysis in same values from the above table to validate Pareto
Principle and test whether 80 percent of contributions does come from 20 percent of
contributors. Since total authors number is 70783, that mean the 20 percents of total authors
number is 14150.6

Total number of authors is 70783

20 percent of authors value is 14150.6

Total number of publications is 20637

80 percent of publications value is 16509.6

Based on analysis, the value of “Accumulated % of A *B” is 22.38 percent of


authors were contributed nearly eighty percent of contributions, once the “Accumulated
Contributors” is 7435. In 80/20 rule view, the valu e should be very close to 80 percent. We
can conclude that this result is nearly compliance with Pareto Principles. This analysis has
proved by ninth (The finding of this study does not correspond to the Price’s Square Root
Law and fit for Pareto Principle (80X20 Rule) for author contribution.) hypothesis.

H-INDEX

The index is based on the distribution of citations received by a given researcher's


publications. Hirsch writes:

A scientist has index h if h of his/her Np papers have at least h citations each, and the other
(Np − h) papers have no more than h citations each.

In other words, a scholar with an index of h has published h papers each of which has been
[4]
cited in other papers at least htimes. Thus, the h-index reflects both the number of
publications and the number of citations per publication. The index is designed to improve
upon simpler measures such as the total number of citations or publications. The index
works properly only for comparing scientists working in the same field; citation
conventions differ widely among different fields.

136
PURPOSE

The h-index serves as an alternative to more traditional journal impact factor metrics
in the evaluation of the impact of the work of a particular researcher. Because only the most
highly cited articles contribute to the h-index, its determination is a simpler process. Hirsch
has demonstrated that h has high predictive value for whether a scientist has won honors
likeNational Academy membership or the Nobel Prize. The h-index grows as citations
accumulate and thus it depends on the "academic age" of a researcher.

G-INDEX

The g-index is an index for quantifying scientific productivity based on publication record.
[1]
It was suggested in 2006 by Leo Egghe.

The index is calculated based on the distribution of citations received by a given


researcher's publications:

Given a set of articles ranked in decreasing order of the number of citations that
they received, the g-index is the (unique) largest number such that the top g articles
2
received (together) at least g citations.

Just as with the h-index, the g-index is a number which is the same for two
different quantities:

g is (1) the number of highly cited articles, such that each of them has brought (2)
on average g citations.

This is in fact a rewriting of the definition

as

137
VOS viewer Manual
1 Introduction
VOSviewer is a computer program for creating maps based on network data and
for visualizing and exploring these maps. The main features of VOSviewer can be
summarized as follows:

 Creating maps based on network data. Maps can be created based directly on
the adjacency matrix of a network, but it is also possible to create maps of
publications, authors, or journals based on bibliographic coupling, co-citation,
or co-authorship networks extracted from Web of Science or Scopus data.
Term maps can be created directly based on a text corpus. Maps are created
1
using the VOS mapping technique and the VOS clustering technique. 

 Visualizing and exploring maps. Two visualizations are provided, the network
visualization and the density visualization. Zooming and scrolling functionality
allows maps to be explored in full detail, which is essential when working with
large maps containing hundreds or even thousands of items. 

Although VOSviewer is intended primarily for analyzing bibliometric networks, the


program can in fact be used to create, visualize, and explore maps based on any
type of network data.

VOSviewer is written in the Java programming language, which means that it runs
on most hardware and operating system platforms. VOSviewer can be obtained
from www.vosviewer.com. The program can be used freely for any purpose.

This manual is concerned with version 1.6.1 of VOSviewer. The manual is


organized as follows. We first introduce the user interface of VOSviewer in Chapter
2, we then explain the file types used by VOSviewer in Chapter 3, and finally we
discuss some advanced topics in Chapter 4. For additional information about
VOSviewer, we refer to a paper that we have written (Van Eck & Waltman, 2010).
In this paper, a general introduction to VOSviewer is provided. Also, the technical
implementation of specific parts of the program is discussed in considerable detail.
Similar information, including a step-by-step tutorial, can also be found in a recent
book chapter (Van Eck & Waltman, 2014).

1
Together, these two techniques provide a unified framework for mapping and clustering. For more
information about the techniques, we refer to Van Eck, Waltman, Dekker, and Van den Berg (2010) and
Waltman, Van Eck, and Noyons (2010).

3
2 User interface
The main window of VOSviewer is shown in Figure 1. As can be seen in the figure,
the main window consists of the following five panels:

 Main panel. In this panel, a selected area in the currently active map is shown.
The zoom and scroll functionality of VOSviewer can be used to select the area
in the currently active map that is shown in the main panel. 

 Options panel. This panel can be used to change the way in which the
currently active map is shown in the main panel. 

 Information panel. In this panel, information about an item in the currently
active map can be shown. 

 Overview panel. In this panel, an overview of the currently active map is
shown. A rectangular frame is displayed in the overview panel to indicate the
area in the currently active map that is shown in the main panel. 

 Action panel. This panel can be used to undertake all kinds of actions, such as
creating a new map, opening or saving an existing map, making a screenshot,
finding an item, and constructing or transforming a map. 

VOSviewer provides two visualizations, which are referred to as the network


visualization and the density visualization. As shown in Figure 1, the Network
Visualization and Density Visualization tabs can be used to switch between
the two visualizations.

In the next sections, we discuss the five panels of VOSviewer in more detail.

To get some hands-on experience with VOSviewer, we encourage the reader to


use the map file journal_map.txt, which is distributed together with VOSviewer.
This file can be used to reproduce the figures in this chapter. The file contains a
map of 232 journals in the fields of economics, management, and operations
research (for more details, see Van Eck & Waltman, 2010). To open the map,
press the Open button on the Action tab in the action panel, select the map file
journal_map.txt, and press the OK button.

4
4

1
5 2

Figure 1. The main window of VOSviewer. The numbers indicate


(1) the main panel, (2) the options panel, (3) the information panel,
(4) the overview panel, and (5) the action panel.

2.1 Main panel

As can be seen in Figure 1, the main panel of VOSviewer is used to show a


selected area in the currently active map. The zoom and scroll functionality of
VOSviewer can be used to determine which area in the currently active map is
shown. The way in which the currently active map is shown depends on whether
the network visualization or the density visualization is selected.

We first consider the network visualization. When the network visualization is selected,
items are indicated by their label and, by default, also by a circle. For each item, the
font size of the item’s label and the size of the item’s circle depend on the weight of
2
the item. The color of the circle of an item can be determined in a

2
The weight of an item is determined by the weight or normalized weight column in a map file (see
Section 3.1). When a new map is created without providing a map file with a weight or normalized
weight column, the weight of an item is set equal to the total strength of all links of the item. When an
existing map is opened without providing a map file with a weight or normalized weight column, all
items in the map are given the same weight.

5
number of different ways. If items have been assigned to clusters, the color of the
circle of an item can be determined by the cluster to which the item belongs. If
scores have been given to items, the color of the circle of an item can be
determined by the item’s score, where by default colors range from blue (low
3
score) to green (average score) to red (high score). A third possibility is to
determine the color of the circle of an item by the color of the item as specified in
a map file (using the red, green, and blue columns; see Section 3.1). The options
panel can be used to switch between the different ways of coloring items. We note
that for some items the label may not be visible. This is done in order to avoid
overlapping labels. Also, by default, no lines between items are displayed.
However, this can be changed in the options panel. An example of the network
visualization is shown in Figure 2.

Figure 2. The network visualization.

3
If colors are determined by the scores of items, a color bar is shown in the lower right corner of the
main panel. This color bar indicates which colors correspond with which scores.

6
We now consider the density visualization. There are in fact two variants of the
density visualization. We first discuss the item density visualization, followed by
the cluster density visualization. The options panel can be used to switch between
the two variants of the density visualization. We refer to Van Eck and Waltman
(2010) for a detailed discussion of the technical implementation of the density
visualization.

In the item density visualization, items are indicated by their label in a similar way
as in the network visualization. Each point in a map has a color that depends on
the density of items at that point. By default, this color is somewhere in between
red and blue. The larger the number of items in the neighborhood of a point and
the higher the weights of the neighboring items, the closer the color of the point is
to red. Conversely, the smaller the number of items in the neighborhood of a point
and the lower the weights of the neighboring items, the closer the color of the
point is to blue. An example of the item density visualization is shown in Figure 3.

The cluster density visualization is available only if items have been assigned to
clusters. The cluster density visualization is similar to the item density
visualization except that the density of items is displayed separately for each
cluster of items. In the cluster density visualization, the color of a point in a map
is close to the color of a certain cluster if there are a large number of items
belonging to that cluster in the neighborhood of the point. Like in the item density
visualization, items with high weights count more heavily than items with low
weights. An example of the cluster density visualization is shown in Figure 4.

To facilitate the detailed examination of a map, VOSviewer offers zoom and scroll
functionality. In the main panel, zooming and scrolling can be done in the
following three ways:

 Using the mouse. To zoom in, move the mouse upwards while keeping the
right mouse button pressed. Conversely, to zoom out, move the mouse
downwards while keeping the right mouse button pressed. As an alternative,
the mouse wheel can be used to zoom in and out. To scroll through a map,
move the mouse while keeping the left mouse button pressed. 

 Using the navigation buttons in the upper right corner of the main panel (see
Figure 1). Use the plus and minus buttons to zoom in and out. Use the arrow
buttons to scroll through a map. 

 Using the keyboard. Use the plus and minus keys to zoom in and out. Use the
arrow keys to scroll through a map. 

7
Figure 3. The item density visualization.

Figure 4. The cluster density visualization.

8
2.2 Options panel

The options panel can be used to change the way in which the currently active
map is shown in the main panel. Different options are provided for the network
visualization and the density visualization. Some of the options are not always
available. The options panel shows only the options that are relevant given the
currently available map data.

When the network visualization is selected, the following options may be provided:

 Labels. 

o Size. This slider determines the size of the font used to display the labels of
items in the main panel. 

o Size variation. The higher the weight of an item, the larger the font that is
used to display the item’s label in the main panel. The Size variation
slider determines the strength of this effect. 

o Max. length. This text box determines the maximum length of a label
displayed in the main panel. If the length of a label exceeds the maximum
length, the last part of the label is not displayed. 

o Font. This drop down list determines the font used to display the labels of
items in the main panel. 

 Lines. 

o No. of lines. If the adjacency matrix of a network is available, lines
between items can be displayed in the main panel. The No. of lines text
box determines the maximum number of lines to be displayed. If the
number of links in the network exceeds the maximum number of lines to
be displayed, lines are displayed only for the strongest links. 

o Without normalization and With normalization. These radio buttons
determine the way in which the strongest links in the network are selected.
If the Without normalization radio button is selected, the links with the
highest unnormalized link strength are selected. If the With
normalization radio button is selected, the links with the highest
normalized link strength are selected. 

 Visualization. 

o Circles and Frames. These radio buttons determine the way in which items
are indicated in the main panel. If the Circles radio button is selected, items
are indicated both by their label and by a circle. If the Frames radio 

9
button is selected, items are indicated by their label displayed within a
rectangular frame.

 Colors. 

o User defined colors, Score colors, Cluster colors, and No colors.
These radio buttons determine the way in which items are colored in the
main panel and the overview panel. If the User defined colors radio
button is selected, each item has its own color as specified in a map file
(using the red, green, and blue columns; see Section 3.1). If the Score
colors radio button is selected, items are colored based on their score. If
the Cluster colors radio button is selected, items are colored based on the
cluster to which they belong. If the No colors radio button is selected, all
items are colored in gray. 

o Black background. This check box determines whether the main panel has
a black or a white background color. 

o Min./Max. Scores. This button is available only if the Score colors radio
button is selected. The button brings up the Min./Max. Scores dialog box.
This dialog box can be used to change the minimum and the maximum
score that determine how scores are mapped to colors. By default, scores
less than or equal to the minimum score are mapped to blue, scores equal
to the average of the minimum and the maximum score are mapped to
green, and scores greater than or equal to the maximum score are mapped
to red. 

o Score Colors. This button is available only if the Score colors radio
button is selected. The button offers three options: 

♣ Import. This option is the default choice. Choose this option to import
score colors from a score colors file (see Section 3.4).

♣ Export. Choose this option to export the current score colors to a score
colors file (see Section 3.4).

♣ Restore original. Choose this option to restore the original score colors.

o Cluster Colors. This button is available only if the Cluster colors radio
button is selected. The button offers four options:

♣ Edit. This option is the default choice. Choose this option to edit the
current cluster colors in the Edit Cluster Colors dialog box.

10
♣ Import. Choose this option to import cluster colors from a cluster
colors file (see Section 3.4).

♣ Export. Choose this option to export the current cluster colors to a


cluster colors file (see Section 3.4).

♣ Restore original. Choose this option to restore the original cluster


colors.

When the density visualization is selected, the following options may be provided:

 Labels. These options are identical to the ones provided when the network
visualization is selected. 

 Visualization. 

o Item density or Cluster density. These radio buttons determine whether the
item density visualization or the cluster density visualization is selected. 

o Kernel width. This slider determines the value of the kernel width
parameter. We refer to Van Eck and Waltman (2010) for more information
about this parameter. 

 Colors. 

o White background. This check box is available only if the cluster density
visualization is selected. The check box determines whether the main panel
has a black or a white background color. 

o Density Colors. This button is available only if the item density
visualization is selected. The button offers three options: 

♣ Import. This option is the default choice. Choose this option to import
density colors from a density colors file (see Section 3.4).

♣ Export. Choose this option to export the current density colors to a


density colors file (see Section 3.4).

♣ Restore original. Choose this option to restore the original density


colors.

o Cluster Colors. This button is available only if the cluster density


visualization is selected. The button offers four options:

♣ Edit. This option is the default choice. Choose this option to edit the
current cluster colors in the Edit Cluster Colors dialog box.

♣ Import. Choose this option to import cluster colors from a cluster


colors file (see Section 3.4).

11
♣ Export. Choose this option to export the current cluster colors to a
cluster colors file (see Section 3.4).

♣ Restore original. Choose this option to restore the original cluster


colors.

2.3 Information panel

In the information panel of VOSviewer, information about an item in the currently


active map can be shown. For example, when the mouse cursor is moved over the
label of an item in the main panel, information about the item will be shown in the
information panel. If item descriptions have been provided (using the description
column in a map file; see Section 3.1), the description of the item will be shown.
If no item descriptions have been provided, the label of the item will be shown
along with the number of the cluster to which the item belongs and the score of
the item (assuming that items have been assigned to clusters and that scores
have been given to items).

Information about a link between two items can also be shown in the information
panel. When the mouse cursor is moved over a line between two items,
information about the link between the items will be shown in the information
panel. The labels of the items and the strength of the link will be shown.

2.4 Overview panel

In the overview panel of VOSviewer, an overview of the currently active map is shown.
Each item in the map is indicated by a small colored dot. A rectangular frame is
displayed in the overview panel to indicate the area in the currently active map that is
shown in the main panel. By left-clicking in the overview panel, the area in the
currently active map that is shown in the main panel can be changed.

2.5 Action panel

The action panel of VOSviewer can be used to undertake all kinds of actions. The
panel consists of three tabs: The Action tab, the Items tab, and the Map tab.
These tabs are discussed in Subsections 2.5.1, 2.5.2, and 2.5.3.

2.5.1 Action tab

The Action tab can be used to perform a number of basic actions. The following
buttons are available on the Action tab:

 Map. 

12
o Create. Use this button to create a new map. The button brings up the
Create Map wizard. There are three ways in which a new map can be
created using this wizard:

♣ Create a map based on a network. This option requires the


adjacency matrix of a network. The adjacency matrix indicates which
pairs of items in the network are linked to each other, and for each pair
of linked items it indicates the strength of their link. The matrix can be
read from a network file. In addition to a network file, a map file may
also be provided. The map file may for example contain labels and
descriptions of items. We refer to Sections 3.1 and 3.2 for a detailed
discussion of map files and network files. Instead of a map file and a
4
network file, it is also possible to use Pajek files or a GML file.

♣ Create a map based on bibliographic data. This option requires


bibliographic data. The data can be read from Web of Science, Scopus,
or PubMed files. Using this option, it is possible to create maps of
scientific publications, scientific journals, researchers, or research
organizations based on bibliographic coupling relations (i.e., multiple
items citing the same publication), co-citation relations (i.e., multiple
items being cited by the same publication), or co-authorship relations
(i.e., multiple items co-authoring the same publication).

5
♣ Create a map based on a text corpus. This option requires a text

corpus. The corpus is stored in a corpus file. A corpus file is a text file that
contains on each line the text of a document. This text is assumed to be in
English. Using natural language processing techniques, VOSviewer extracts
terms from the corpus file, where a term is defined as a sequence of nouns
and adjectives (ending with a noun). Based on the extracted terms,
VOSviewer creates a term map. This is a map in which terms are located in
such a way that the distance between two terms provides an indication of
the number of co-occurrences of the terms. In general, the smaller the
distance between two terms, the larger the number of co-occurrences of
the terms. Two terms are said to
co-occur if they both occur on the same line in the corpus file. In

4
Pajek is a well-known computer program for social network analysis (De Nooy, Mrvar, & Batagelj,
2011). It is available at http://pajek.imfm.si/doku.php. VOSviewer supports Pajek matrix, network,
partition, and vector files.

5
We refer to Van Eck and Waltman (2011) for more information about this option.

13
addition to a corpus file, a scores file may also be provided. A scores file
is a text file that contains on each line the score of a document. Based
on the scores in a scores file, VOSviewer calculates a score for each
term in the term map. The score of a term equals the average score of
the documents in which the term occurs. In the network visualization,
colors can be used to indicate the scores of terms. It is also possible to
create a term map based on Web of Science, Scopus, or PubMed files
instead of a corpus file. In that case, terms are extracted from the titles
and abstracts of scientific publications.

o Open. Use this button to open an existing map. The button brings up the
Open Map dialog box. Map data can be read from a map file. The map file
must contain coordinates of items. The file may also contain, for example,
labels, descriptions, weights, scores, and cluster numbers of items. In
addition to a map file, a network file may also be provided. The network file
contains the adjacency matrix of a network. Based on this matrix, lines
between the items in a map can be displayed. Instead of a map file and a
network file, it is also possible to use Pajek and GML files. For a detailed
discussion of map files and network files, we refer to Sections 3.1 and 3.2.

o Save. This button offers three options:

♣ Save map. This option is the default choice. Choose this option to save the
currently active map in a map file. Map files are discussed in Section 3.1.
Instead of a map file, it is also possible to use a Pajek or GML file.

♣ Save network. This option can be chosen only if the adjacency matrix
of a network is available. Choose this option to save the adjacency
matrix in a network file. Network files are discussed in Section 3.2.
Instead of a network file, it is also possible to use a Pajek or GML file.

♣ Save normalized network. This option can be chosen only if the


adjacency matrix of a network is available. Choose this option to save
the normalized adjacency matrix in a network file. Normalization of an
adjacency matrix is discussed in Subsection 2.5.3. Network files are
discussed in Section 3.2. Instead of a network file, it is also possible to
use a Pajek or GML file.

o Print. Use this button to print a screenshot of the main panel.

o Screenshot. This button offers three options:

14
♣ Save to file. This option is the default choice. Choose this option to
save a screenshot of the main panel. The screenshot resembles the
main panel as closely as possible. However, the navigation buttons in
the upper right corner of the main panel are not shown in the
screenshot. If the Optimize labeling check box in the Screenshot
Options dialog box (see below) is checked, the number of labels visible
in the screenshot is optimized. This means that some labels not visible
in the main panel may be visible in the screenshot. Screenshots can be
saved in a number of graphic file formats. For most applications, we
recommend the PNG format. Some formats, such as EPS, PDF, and
SVG, use vector graphics when saving a screenshot. This has the
advantage that the screenshot can be resized without loss of quality.
We note that Figures 2 to 4 were obtained using the screenshot
functionality of VOSviewer.

♣ Copy to clipboard. Choose this option to copy a screenshot of the


main panel to the clipboard. The screenshot can for example be pasted
into a Word document or a PowerPoint presentation.

♣ Screenshot options. Choose this option to bring up the Screenshot


Options dialog box. This dialog box can be used to change some
screenshot-related settings of VOSviewer. The Scaling drop down list
determines the resolution (i.e., the number of pixels) of a screenshot.
The resolution is calculated relative to the resolution of the main panel.
Using a scaling of 100%, screenshots have the same resolution as the
main panel. Using the default scaling of 200%, screenshots have a
resolution that is twice as high (i.e., twice as many pixels horizontally
and vertically) as the resolution of the main panel. The Scaling drop
down list has no effect on screenshots that are saved in a file format
that uses vector graphics. The Optimize labeling check box
determines whether the number of labels visible in a screenshot is
optimized. Optimizing the number of labels visible in a screenshot
means that some labels not visible in the main panel itself may be
visible in a screenshot of the main panel. The Include border check
box determines whether a border is included around a screenshot.

 Info. 

o Manual. Use this button to open the VOSviewer manual. This requires an
internet connection. 

15
o About VOSviewer. This button brings up the About VOSviewer dialog
box. This dialog box provides some general information about VOSviewer,
such as the version number, a copyright notice, a license text, the address
of the VOSviewer web site, and a list of software libraries used by
VOSviewer.

o Update VOSviewer. This button is available only if a new version of


VOSviewer is available. Use this button to open the VOSviewer website.
The new version of VOSviewer can then be downloaded from the website.

2.5.2 Items tab

The Items tab provides a list of items in the currently active map. By default, a
list of all items in the map is provided. However, a filter can be used to restrict the
list to a subset of the items in the map. To do so, enter a filter string in the Filter
text box. This yields a list of all items whose label contains the filter string.

The Group items by cluster check box determines the way in which items are
listed. If the check box is unchecked, items are simply listed alphabetically. If the
check box is checked, items are first grouped by cluster and then listed
alphabetically within each cluster.

By double-clicking on an item on the Items tab, the item is shown in the main
panel.

2.5.3 Map tab

The Map tab can be used to construct or transform a map. Maps are constructed
6
using the VOS mapping technique and the VOS clustering technique. The Map
tab can also be used to change the parameters of these techniques. The following
parameters and controls are available on the Map tab:

 Parameters. 

o Mapping attraction and Mapping repulsion. These parameters influence the
way in which items are located in a map by the VOS mapping technique. The
Mapping attraction parameter must have an integer value between -9 and
+10. The Mapping repulsion parameter must have an integer value between
-10 and +9. The value of the Mapping repulsion parameter must 

6
We refer to Van Eck et al. (2010), Waltman and Van Eck (2013), and Waltman et al. (2010) for more
information about these techniques.

16
be lower than the value of the Mapping attraction parameter. For most
purposes, our recommendation is to work with the default parameter values.

o Clustering resolution. This parameter determines the level of detail


provided by the VOS clustering technique. The parameter must have a
non-negative value. The larger the value of the parameter, the larger the
number of clusters that will be obtained. Our recommendation is to try out
different values for the parameter and to use the value that yields the level
of detail considered most satisfactory for the particular application at hand.

o Min. cluster size. This parameter determines the minimum cluster size
used by the VOS clustering technique. Each cluster produced by the VOS
clustering technique must consist of the minimum number of items
specified by this parameter. The Min. cluster size parameter can be
useful to simplify the clustering solutions obtained from the VOS clustering
technique by getting rid of small and uninteresting clusters.

o Advanced Parameters. This button brings up the Advanced


Parameters dialog box. This dialog box can be used to change a number
of more advanced parameters of the VOS mapping and clustering
techniques. These parameters are discussed below.

 Run. The Run button is available only after a new map has been created
(using the Create button on the Action tab in the action panel). The button
can be used to run the VOS mapping and clustering techniques. The radio
buttons determine whether both techniques are run or only one of them. 

 Rotate/flip. 

o Rotate. Use this button to rotate the currently active map. The Degrees
to rotate parameter determines the number of degrees by which the map
will be rotated.

o Flip Horizontally. Use this button to flip the currently active map in
horizontal direction.

o Flip Vertically. Use this button to flip the currently active map in vertical
direction.

As mentioned above, the Advanced Parameters dialog box can be used to


change a number of more advanced parameters of the VOS mapping and
clustering techniques. The following parameters are available:

 Normalization. 

17
o No normalization. If this radio button is selected, no normalization of the
adjacency matrix of a network is performed. In general, we do not
recommend this option.

o Normalization method 1. If this radio button is selected, normalization


method 1 is used for normalizing the adjacency matrix of a network. This is
the default normalization method. This method uses the so-called
association strength measure (Van Eck & Waltman, 2009).

o Normalization method 2. If this radio button is selected, normalization


method 2 is used for normalizing the adjacency matrix of a network. This is
an alternative normalization method. The difference between normalization
method 1 and normalization method 2 is somewhat technical, and we will
therefore not discuss it here.

o Use LinLog/modularity normalization. If this check box is checked,


normalization is performed in the same way as in the LinLog mapping
technique and the modularity clustering technique. For more information
about these techniques, we refer to Newman (2004) and Noack (2007,
2009).

 Random number generator. 



o Fixed seed. This text box determines the seed of the random number
generator used by the optimization algorithms of the VOS mapping and
clustering techniques. The seed must be a non-negative integer.

o Do not use fixed seed. If this check box is checked, the random number
generator used by the optimization algorithms of the VOS mapping and
clustering techniques does not have a fixed seed. Instead, each time the
VOS mapping and clustering techniques are run, a different seed will be
used, possibly leading to different results.

 Mapping. 

o Random starts. This parameter determines the number of times the
optimization algorithm of the VOS mapping technique is run. Each time the
optimization algorithm is run, a different mapping solution may be
obtained. The best mapping solution obtained in all runs of the optimization
algorithm will be used as the final mapping solution. The larger the value of
the Random starts parameter, the more accurate the final mapping
solution that will be obtained.

18
o Max. iterations. This parameter determines the maximum number of
iterations performed by the optimization algorithm of the VOS mapping
technique. The larger the value of the parameter, the more accurate the
mapping solution that will be obtained. In general, the default value of the
parameter works well and does not need to be changed.

o Initial step size, Step size reduction, and Step size convergence. These
are technical parameters of the optimization algorithm of the VOS mapping
technique. The parameters must have values between 0.000001 and 1. The
smaller the value of the Step size convergence parameter, the more accurate
the mapping solution that will be obtained. In general, the default values of the
parameters work well and do not need to be changed.

 Clustering. 

o Random starts. This parameter determines the number of times the
optimization algorithm of the VOS clustering technique is run. Each time
the optimization algorithm is run, a different clustering solution may be
obtained. The best clustering solution obtained in all runs of the
optimization algorithm will be used as the final clustering solution. The
larger the value of the Random starts parameter, the more accurate the
final clustering solution that will be obtained.

o Iterations. This parameter determines the number of iterations performed


by the optimization algorithm of the VOS clustering technique. The larger
the value of the parameter, the more accurate the clustering solution that
will be obtained. In general, the default value of the parameter works well
and does not need to be changed.

19
3 File types
The two primary file types used by VOSviewer are the map file and the network
file. Map files and network files are simple text files that can easily be viewed and
edited using a text editor (e.g., Notepad) or a spreadsheet program (e.g., Excel).
Map files and network files consist of multiple columns. Hence, each line in a map
file or a network file contains multiple pieces of information. Different pieces of
information on the same line are separated from each other by a comma, a
semicolon, or a tab. If a piece of information (e.g., the label of an item) itself
contains a comma or a semicolon, the whole piece of information needs to be
enclosed within double quotes.

We discuss map files and network files in more detail in Sections 3.1 and 3.2. Four
additional file types, the thesaurus file, the cluster colors file, the score colors file,
and the density colors file, are discussed in Sections 3.3 and 3.4. We note that all
example files mentioned in this chapter are distributed together with VOSviewer.

3.1 Map file

A map file is a text file that contains information about the items in a map. Each
line in a map file corresponds with an item. The only exception is the first line.
This is a header line that indicates what is contained in each of the columns of a
map file. Below, we list the columns that can be used in a map file. For each
column, we provide the column header and we specify what the column contains.

id The ID of an item. Items need to have an ID only if a map file is used in


combination with a network file.

label The label of an item.

sublabel The sublabel of an item. In the main panel, the sublabel of an item is displayed
below the item’s ordinary label. Sublabels are displayed in a smaller font.

description The description of an item. The description of an item is used to provide


information about the item in the information panel. HTML formatting can be used
in this column.

url The URL of an item. This column can be used to associate a web page with an
item. Clicking on the label of an item in the main panel will cause the web page
associated with the item to be opened in a web browser.

x The horizontal coordinate of an item.

y The vertical coordinate of an item.

weight The weight of an item. Only non-negative numbers are allowed in this column.
The higher the weight of an item, the more prominent the item is presented in the

20
main panel.

normalized weight The normalized weight of an item. Only non-negative numbers are allowed in this
column. The higher the normalized weight of an item, the more prominent the
item is presented in the main panel. The default presentation of an item is
obtained by setting the item’s normalized weight to 1.

score The score of an item. In the main panel and the overview panel, items can be
colored based on their scores, with by default colors ranging from blue to green to
red.

cluster The number of the cluster to which an item belongs. Only integers between 1 and
1000 are allowed in this column.

red The red component of the color of an item. Only integers between 0 and 255 are
allowed in this column.

green The green component of the color of an item. Only integers between 0 and 255
are allowed in this column.

blue The blue component of the color of an item. Only integers between 0 and 255 are
allowed in this column.

In a map file, one always uses only a subset of the above columns. The order in
which the columns are used is not important.

There are a number of restrictions on the columns that are used in a map file:

 There must be an id column or a label column. (If there is no label column, the
ID of an item is used as the item’s label.) 

 If there is a sublabel column, there must be a label column as well. 

 The x and y columns must be used together. 

 The red, green, and blue columns must be used together. 

 The weight column and the normalized weight column cannot be used

together. For an example of a map file, see the file journal_map.txt. 

3.2 Network file

A network file is a text file that contains the adjacency matrix of a network. The
adjacency matrix of a network is a square matrix that indicates for each pair of
items in the network the strength of the link between the items. The strength of a
link is given by a non-negative number. If there is no link between two items, the
strength of the link between the items equals zero. VOSviewer works with
symmetric adjacency matrices. If a network file contains an asymmetric adjacency
matrix, VOSviewer will make the matrix symmetric by averaging corresponding
elements on both sides of the main diagonal.

21
A network file has either a full format or a sparse format:

 Full format. The entire adjacency matrix, including elements that are equal to
zero, is stored in the network file. The file consists of n lines and n + 1
columns, where n denotes the number of items in the network. The element in
the ith row and the jth column of the adjacency matrix is stored on the ith line
and in the (j + 1)th column of the network file. The first column of the network
file contains IDs of items. This column indicates for each row and column of the
adjacency matrix the ID of the corresponding item. For an example of a
network file in full format, see the file journal_network_full.txt. 

 Sparse format. Only the non-zero elements of the adjacency matrix are stored
in the network file. The file consists of two or three columns. The first two
columns contain IDs of items. The third column contains the non-zero
elements of the adjacency matrix. This column indicates the strength of the
link between the items referred to in the first two columns. If there is no third
column, the strength of the link between the items referred to in the first two
columns always equals one. In the case of a symmetric adjacency matrix, it is
sufficient to store only half of the matrix (either the upper triangular part or
the lower triangular part) in the network file. For an example of a network file
in sparse format, see the file journal_network_sparse.txt. 

A network file is usually used in combination with a map file. For each item ID in
the network file, there must then be a corresponding item ID in the map file.

3.3 Thesaurus file


A thesaurus file is a text file that can be used in the following situations:

 When creating a map based on Web of Science or Scopus files (see Subsection
2.5.1), a thesaurus file can be used to merge different variants of a source
title, an author name, an organization name, or a cited reference. This may for
example be useful when the name of a researcher is written in different ways
in different publications (e.g., with first initial only and with all initials). A
thesaurus file can then be used to indicate that different names in fact refer to
the same researcher. 

 When creating a map based on a corpus file (see Subsection 2.5.1), a
thesaurus file can be used to merge synonyms into a single term. This may be
useful not only for merging different terms referring to the same concept, but
also for merging different spellings of the same term (e.g., behavior and 

22
behaviour). It may also be useful for merging an abbreviation of a term with
the term itself.

Each line in a thesaurus file contains a label and indicates an alternative label that
replaces the original label. The only exception is the first line. This is a header line
that indicates what is contained in each of the columns of a thesaurus file. A
thesaurus file must have two columns: A label column and a replace by column.
The label column contains a label. Depending on the situation (see above), a label
may represent a source title, an author name, an organization name, a cited
reference, or a term. The replace by column contains an alternative label that
replaces the original label. The replace by column may also be empty. In that
case, the original label is not replaced by an alternative one, but instead the
original label is simply ignored. When creating a map based on a corpus file, this
allows a thesaurus file to be used as a kind of stop word list. A thesaurus file may
for example indicate that certain uninteresting terms (e.g., copyright and Elsevier
in the case of a corpus of abstracts of scientific publications) should be ignored.
For examples of thesaurus files, see the files thesaurus_authors.txt and
thesaurus_terms.txt.

3.4 Cluster colors file, score colors file, and density colors file

A cluster colors file is a text file that contains the colors of clusters. Each line in a
cluster colors file corresponds with a cluster. The only exception is the first line.
This is a header line that indicates what is contained in each of the columns of a
cluster colors file. A cluster colors file must have four columns: A cluster column, a
red column, a green column, and a blue column. The cluster column contains
cluster numbers. Only integers between 1 and 1000 are allowed in this column.
The red, green, and blue columns contain the red, green, and blue components of
the colors of clusters. Only integers between 0 and 255 are allowed in these
columns. For an example of a cluster colors file, see the file cluster_colors.txt.

A score colors file is similar to a cluster colors file except that instead of a cluster
column it has a color value column, with values between 0 and 1. If scores have
been given to items, items can be colored by transforming their scores into color
values and by matching these color values with the color values in the color value
column of the score colors file. Exact matching of color values is usually not
possible, and in that case the colors in the score colors file are interpolated. For an
example of a score colors file, see the file score_colors.txt.

A density colors file is identical to a score colors file. In the item density
visualization, the color of a point in a map is determined by transforming the

23
density of items at that point into a color value and by matching this color value
with the color values in the color value column of the density colors file. Exact
matching of color values is usually not possible, and in that case the colors in the
density colors file are interpolated. For an example of a density colors file, see the
file density_colors.txt.

24
4 Advanced topics
In this chapter, some advanced topics are addressed. We first consider the use of
command line parameters (Section 4.1). We then discuss how a map can be made
available on the internet (Section 4.2) and how the amount of memory available
to VOSviewer can be increased (Section 4.3).

4.1 Using command line parameters

VOSviewer supports a number of command line parameters. These parameters


can for example be used to automatically open a map when VOSviewer is started
or to override some of the default settings of VOSviewer. The command line
parameters supported by VOSviewer are listed below.

Command line parameters for opening or creating a map

gml Use this parameter to specify a GML file. Based on this file, a map will be opened
or created when VOSviewer is started.

map Use this parameter to specify a map file (see Section 3.1). The map in this file
will be opened when VOSviewer is started.

network Use this parameter to specify a network file (see Section 3.2). An adjacency
matrix will be read from this file when VOSviewer is started. The adjacency
matrix can be used to display lines between the items in a map or to create a
new map.

pajek_network Use this parameter to specify a Pajek network (or matrix) file. Based on this file,
a map will be opened or created when VOSviewer is started.

pajek_partition Use this parameter to specify a Pajek partition file. Cluster numbers of items will
be read from this file when VOSviewer is started.

pajek_vector Use this parameter to specify a Pajek vector file. Weights of items will be read
from this file when VOSviewer is started.

Command line parameters for creating a term map based on a corpus file

corpus Use this parameter to specify a corpus file. Based on this file, a term map will be
created when VOSviewer is started.

counting_method Use this parameter to specify the counting method to be used in the creation of
the term map (1 for binary counting and 2 for full counting).

min_n_occurrences Use this parameter to specify the minimum number of occurrences a term must
have to be included in the term map.

n_terms Use this parameter to specify the number of terms to be included in the term
map. VOSviewer will select the terms that seem most relevant.

scores Use this parameter to specify a scores file. In the creation of the term map, this
file will be used to calculate a score for each term.

25
thesaurus Use this parameter to specify a thesaurus file (see Section 3.3). In the creation of
the term map, this file can be used to merge synonyms into a single term.

Visualization-related command line parameters

black_background Use this parameter to have a black background color in the network visualization.

cluster_colors Use this parameter to specify a cluster colors file (see Section 3.4). Cluster colors
will be imported from this file when VOSviewer is started.

density_colors Use this parameter to specify a density colors file (see Section 3.4). Density
colors will be imported from this file when VOSviewer is started.

label_size Use this parameter to specify the initial value of the Size slider in the options
panel.

label_size_variation Use this parameter to specify the initial value of the Size variation slider in the
options panel.

min_score Use this parameter to specify the initial value of the Min. score text box in the
Min./Max. Scores dialog box.

max_score Use this parameter to specify the initial value of the Max. score text box in the
Min./Max. Scores dialog box.

n_lines Use this parameter to specify the initial value of the No. of lines text box in the
options panel.

score_colors Use this parameter to specify a score colors file (see Section 3.4). Score colors
will be imported from this file when VOSviewer is started.

visualization Use this parameter to specify which visualization will be selected when VOSviewer
is started (1 for the network visualization and 2 for the density visualization).

white_background Use this parameter to have a white background color in the cluster density
visualization.

zoom_level Use this parameter to specify the initial zoom level in the main panel. The zoom
level must have a value of at least 1. The higher the zoom level, the more the
main panel will be zoomed in on the center of a map.

Miscellaneous command line parameters

encoding Use this parameter to specify the character encoding that is used by VOSviewer
to read and write text files. For a list of the available encodings, see
http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html. If
this parameter is not used, VOSviewer will attempt to automatically determine
the correct encoding when reading a text file (which in some cases may result in
the use of an incorrect encoding), while it will use the default encoding when
writing a text file.

file_location Use this parameter to specify the folder that is used by VOSviewer as the default
file location.

no_clustering Use this parameter to indicate that no clustering must be performed when
VOSviewer is started. This parameter can for example be used together with the
pajek_network parameter. A map will then be opened or created based on a

26
Pajek network file, but no clustering of the items in the map will be performed.

view_only Use this parameter to start VOSviewer in the view only mode. In this mode, the
Run button on the Map tab in the action panel is not available.

To use the above command line parameters, VOSviewer needs to be run from the
command line. When using the Windows executable of VOSviewer, this can for
example be done as follows:

VOSviewer –map map.txt –visualization 2 –zoom_level 2.5

In this way, VOSviewer will be started and the map in the map file map.txt will be
opened. Also, the density visualization will be selected, and the main panel will be
zoomed in on the center of the map. When instead of the Windows executable of
VOSviewer the VOSviewer JAR file is used, VOSviewer can for example be run in
the following way:

java –jar VOSviewer.jar –map map.txt –visualization 2 –zoom_level 2.5

We note that some command line parameters cannot be used together. For
example, the map and pajek_network parameters and the map and corpus
parameters cannot be used together. On the other hand, some parameters can
only be used in combination with another parameter. The pajek_partition
parameter for example can only be used in combination with the pajek_network
parameter. Similarly, the counting_method parameter can only be used in
combination with the corpus parameter.

4.2 Making a map available on the internet

Suppose one wants to make a map available on the internet. One way to do this is
simply by making a map file and a network file (or only a map file) available on
the internet. To open the map, one then needs to take two steps. In the first step,
the map file and the network file are downloaded from the internet. In the second
step, VOSviewer is started and the downloaded map file and network file are used
to open the map.

The above two-step approach is not very convenient, and therefore an alternative
approach is available as well. In this alternative approach, a map can be opened in
VOSviewer directly from a web page. The alternative approach works as follows. A
map file and a network file (or only a map file) need to be made available on the
internet. Suppose these files are located at

http://www.example.com/map.txt

and

27
http://www.example.com/network.txt

The map can then be made available on a web page by creating a hyperlink that
points to

http://www.vosviewer.com/vosviewer.php?map=http://www.example.com/map.
txt&network=http://www.example.com/network.txt

Using this hyperlink, the map can be opened directly in VOSviewer.

We note that all command line parameters discussed in Section 4.1 can also be used in
a hyperlink. For example, to open a map, to select the density visualization, and to
zoom in on the center of the map, the following hyperlink can be used:

http://www.vosviewer.com/vosviewer.php?map=http://www.example.com/map.
txt&network=http://www.example.com/network.txt&visualization=2&zoom_le
vel=2.5

4.3 Increasing the availability of memory

When using VOSviewer to work with large maps (e.g., maps containing several
thousands of items), the memory requirements can be quite substantial.
VOSviewer will produce an out of memory error if there is not enough memory
available. The availability of memory can be increased by running the VOSviewer
JAR file from the command line and specifying the amount of memory that needs
to be available. For example, if 1000 MB of memory needs to be available, the
VOSviewer JAR file can be run as follows:

java -Xmx1000m -jar VOSviewer.jar

When working with large maps, VOSviewer may also produce a stack overflow
error. To prevent this error from occurring, the stack size needs to be increased.
This can be done by running the VOSviewer JAR file from the command line in the
following way:

java -Xss1000k -jar VOSviewer.jar

In this case, the stack size is set to 1000 KB, but other values are possible as well.

28
References
De Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis
with Pajek (2nd ed.). Cambridge University Press.

Newman, M.E.J. (2004). Fast algorithm for detecting community structure in


networks. Physical Review E, 69, 066133.

Noack, A. (2007). Energy models for graph clustering. Journal of Graph


Algorithms and Applications, 11(2), 453–480.

Noack, A. (2009). Modularity clustering is force-directed layout. Physical Review E,


79, 026102.

Van Eck, N.J., & Waltman, L. (2009). How to normalize cooccurrence data? An
analysis of some well-known similarity measures. Journal of the American
Society for Information Science and Technology, 60(8), 1635–1651.

Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer
program for bibliometric mapping. Scientometrics, 84(2), 523–538.

Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using
VOSviewer. ISSI Newsletter, 7(3), 50–54.

Van Eck, N.J., & Waltman, L. (2014). Visualizing bibliometric networks. In Y. Ding,
R. Rousseau, & D. Wolfram (Eds.), Measuring scholarly impact: Methods and
practice (pp. 285–320). Springer.

Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison
of two techniques for bibliometric mapping: Multidimensional scaling and VOS.
Journal of the American Society for Information Science and Technology,
61(12), 2405–2416.

Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-
scale modularity-based community detection. European Physical Journal B,
86(11), 471.

Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to
mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4),
629–635.

29

You might also like