You are on page 1of 20

Quality & Quantity

https://doi.org/10.1007/s11135-020-01018-1

Distinctive author ranking using DEA indexing

Avick Kumar Dey1 · Pijush Kanti Dutta Pramanik1 · Prasenjit Choudhury1 ·


Goutam Bandopadhyay2

© Springer Nature B.V. 2020

Abstract
The productivity and impact of a researcher can be measured by considering the total num-
ber of articles authored by him/her and corresponding citations. Several techniques exist to
evaluate the cumulative impact of the author’s scholarly output & performance by compar-
ing publications to citations. However, all of them fail to rank each author uniquely, result-
ing in the same index value assigned to two or more authors, although they have diverse
citation patterns. In some indexing, beyond a certain number of citations of a particular
article, the subsequent citations do not add any value to the overall indexing. In this paper,
a new indexing scheme, based on data envelopment analysis, is proposed which ensures
the unique ranking by identifying the different index values of the authors who have even a
minimal difference in the citation pattern. Furthermore, the proposed scheme ensures that
every citation will have impact without any ceiling. The index is applied to a consistent
data set having publications data of the last 40 years in the field of Computer Science. The
outcome, when compared with the existing metrics, confirms that the proposed index pro-
vides more effective results by ranking authors distinctively.

Keywords  DEA · Author ranking · Indexing · Unique ranking · Citation metric

* Prasenjit Choudhury
prasenjit0007@yahoo.co.in
Avick Kumar Dey
iavickdey@gmail.com
Pijush Kanti Dutta Pramanik
pijushjld@yahoo.co.in
Goutam Bandopadhyay
math_gb@yahoo.co.in
1
Department of Computer Science and Engineering, National Institute of Technology, Durgapur,
West Bengal, India
2
Department of Management Studies, National Institute of Technology, Durgapur, West Bengal,
India

13
Vol.:(0123456789)
A. K. Dey et al.

1 Introduction

The research publication is a decisive factor in assessing a researcher’s contribu-


tion and output. The number of publications of a researcher is considered as a direct
reflection of his/her research efforts. Furthermore, the importance and the quality of
a research paper is typically assessed by the attention it receives in the form of cita-
tions counts. To assess the research output and academic achievement of an individual
researcher, different metrics have been proposed, all of which consider (directly or
indirectly) the number of publications and the citations of these publications. Based
on these metrics, researchers are being indexed or ranked, which reflect their research
activity, acceptance, and reputation. There are many well-known techniques for rank-
ing researchers. Among them, the most popular is the h-index (Hirsch 2005), proposed
by J. E. Hirsch in 2005, which takes the number of publications and their citations into
account to measure the productivity of a researcher. The h-index (Waltman and van
Eck 2012) considers both, the number of publications and the number of citations per
publication in a balanced way that indicates the impact of an author on the research by
calculating a single h-index value. But, the h-index suffers from a serious drawback.
Once an article gets enough citations to gain inclusion into the ‘h-core’, additional
citations become irrelevant. The increment of citations of the paper does not indeed
affect the h-index value (Hirsch 2005). As a result, it does not take into account the
higher h-core publication citations when a vastly cited work differs for two researchers
and presents the same h-index. To overcome the limitation of the h-index, Egghe intro-
duced the g-index (Egghe 2006) as a new ranking approach which considers the cita-
tions from higher-cited papers and takes into account all h-core publication citations.
It assigns higher weights to highly-cited papers without taking consideration of all the
publications and the research period of the author.
Both h- and g- indices consider just h-core values, while not considering all pub-
lished works along with the author’s duration of research. Hence, the R- and AR-indi-
ces were proposed as an improvement. These two indices can indicate both the produc-
tivity and its effect on the research community (BiHui et al. 2007).
Waltman and van Eck (2012) along with Li et  al. (2015) proposed the m-index
which resolves the difference between the new and existing researchers by the dividing
h-index by the number of years of research work of an author. Furthermore, Farooq
et al. (2017) have highlighted the significant limitations of using h-index, g-index, and
R-index in the ranking of authors. They proposed a DS-index that ranks the authors
distinctively and differentiates them when publication counts are the same, and the
citation patterns are different.
Table 1 summarises the advantages and limitations of the above-mentioned index-
ing techniques.
In this paper, we propose a new ranking scheme based on a unique index of authors
using their temporal citation information. The objective of this study is to propose a
new indicator called DEA (data envelopment analysis) index (Charnes et al. 1978) for
the assessment of the performance of individual researchers.
Rest of the paper is organised as follows. Section  2 describes the earlier relevant
work. Section 3 discusses the requirement for a new index and the proposed solution.
Section 4 presents the research methodology. Sections 5 and 6 describes and discusses
the results, respectively. Section 7 concludes the paper.

13
Table 1  Advantages and limitations of different ranking methods
Indexing Advantage Limitations

h-index (Hirsch 2005) Simplicity Once a paper belongs to the top h papers, its subsequent citations no
Distinctive author ranking using DEA indexing

Relies on paper citations, not journals longer count


Limits authors by the total number of publications, so shorter
careers
g-index (Egghe 2006) Considers both citations and publications Considers only h-core values and not all the publications
Accounts for the performance of the author’s top publications Considers an average number of citations
Shows author impact for low h-index and citations Can be highly influenced by a successful paper
i10-index (Cornell University 2019a, b) Constant evaluation without time-frame Cannot work for authors with less than 10 citations
Can compare authors working in a common field Once a paper receives 10 citations, the subsequent citations will
have no value
DS index (Farooq et al. 2017) Can rank authors with small differences in their research Fails to resolve ties among medium-cited and low-cited authors
productivity within an academic network
Works well when co-authors are present

13
A. K. Dey et al.

2 Related work

Lotka proposed law for ‘scientific productivity’ in 1926 (Lotka 1926). If na denotes the
number of researchers who have published p papers in an area of research, then na roughly
varies inversely with p2 and directly with N (number of researchers having just one paper
within the set of data). This law, however, merely provides a theoretical productivity
approximation as the dependence on p−2 seldom holds good (Pao 1986). The inflation in
the number of researchers, in truth, breaks down Lotka’s law of inverse squares (Egghe
2006; Kretschmer and Rousseau 2001), even more in the present times. There are cases,
say, in medicine, where over 2000 authors have collaborated to publish an article. The
impact of co-authors in bibliometrics, thus, requires further study.
The h-index is among the most popular indices for evaluating authors. Its biggest advan-
tage is that it acts as an index which, at the same time, evaluates the productiveness as
well as the impact of a researcher, since it considers both, the quality and quantity of their
publications (Hirsch 2005; (Bornmann and Daniel 2009). It can even evaluate the publica-
tions of research teams, including that of nations (Bornmann and Daniel 2009). Hence, a
researcher having an h-index of 50 has published 50 papers where every paper has gar-
nered a minimum of 50 citations, with the consideration of topmost cited publications only.
But then, as in the case of any academic index, this index, too, has certain shortcomings:

• The h-index depends on the highest number of publications while requiring to fulfil a
barrier of a minimum count of citations.
• Variations exist in the h-index among research areas (Glänzel and Persson 2005); there-
fore, one cannot compare the impacts of researchers across different studies (Hirsch
2005).
• It does not mention the actual number of citations for the publications (Costas and Bor-
dons 2007).
• The index varies with the citation manager subtilised and researcher’s career span
(Costas and Bordons 2007).
• The h-index varies with the type of database covered, e.g. Scopus, Google Scholar,
WoS, etc. (Halevi et al. 2017; Bar-Ilan 2008).

A new formula for calculating bibliographic coupling strength was proposed by Shen
et al. (2019). is based on the derived formula (TF-IDF formula) for similarity calculation
between documents. Park and Wolfram proposed a research software citation in the data
citation index (Park and Wolfram 2019), which analyses the prevalence of research soft-
ware citation in Clarivate Analytics’ Data (DCI). But the limitation is that the reuse of
research software is not well reflected in the DCI. The use of persistent identifiers, which
can help to identify research software, did not always result in higher citation rates.
Giuffrida et al. (2019) explored the possibilities of metrics by valuing citations by the
value of citing items. He proposed and experimented with a model to value citations by the
impact of the citing articles. The proposed impact indicator is highly correlated to the tra-
ditional citation count. The new indicator showed greater sensitivity when used to identify
top-cited papers.
Authorship analysis of specialised vs diversified research output carried out by Giuf-
frida et al. (2019) which investigated the relations between amplitude and type of collabo-
ration (intramural, extramural, domestic, or international) and the output of specialised ver-
sus diversified research, which are within or beyond the author’s dominant research topic.

13
Distinctive author ranking using DEA indexing

The aim and significance of the thesis given by Kousha and Thelwall (2019)
observed that citations to doctoral dissertations could be semi-automatically deduced
from Google Scholar. A fifth of the doctoral dissertations had at least one citation in
Google Scholar. There are more Google Scholar citations than Mendeley readers for
older dissertations. Mendeley reader reflects higher counts than Google Scholar cita-
tion counts for recently propagated dissertations.
The study by Chen et  al. (2019) examines the distribution of citations to different
citable objects related to lme4. The reassignment of the citation format of lme4 cata-
lysed its changing citation behaviour.
The journal impact factor suggested by Brito and Rodríguez-Navarro (2019) is used
as a surrogate of the citation for paper evaluation. It compares the evaluations of two
papers by the journal research factor and the number of citations. Journal impact factor
paper evaluation fails when the evaluation contradicts citation evaluation.
Abrishami and Aliakbary (2019) put forward a method citation counts prediction
methodology based on deep neural network learning techniques. He suggested a novel
method for estimating long-term citations of a paper-based on the number of citations
in the first few years after publication. In order to train a citations prediction model, he
employed artificial neural networks which is a powerful machine learning tool used to
model complex patterns in datasets using multiple hidden layers and non-linear activa-
tion functions with growing applications in recent times in many domains including
image and text document processing.
For faculty hiring within the information school community, Zuo et al. (2019) used
a data-driven approach to quantify and rank institutional attractiveness. They also
revealed the importance of weak-tie collaborations for job placement quality.

3 Requirement for a new index and proposed solution

The ranking indices have minimised the limitations of the h-index to some extent.
However, a common drawback of all these existing indices is that they assign the equal
index value to two different authors with diverse citation patterns, and cannot provide
a unique ranking to authors. Table  2 depicts a scenario where the existing indexing
schemes provide the same rank to more than one author despite the difference in the
number of publications and the number of citations for each publication.
Based on the rankings of Table 2, the following points need to be pondered:

a) When two researchers have the same indexing values, it suggests that their research
performance is also precisely the same.
b) After contributing to a certain index value, the subsequent citations of a particular
publication are not considered.

The existing ranking systems cannot comprehensively justify the two points, as
mentioned above. These limitations of the existing indexing schemes indicate that fur-
ther improvement in author indexing is needed.

13

13
Table 2  A snapshot of different index values and ranking of researchers with different publication and citation counts
Author No. of publi- Citations of each paper Total citation h-index Mean SD i10-index g-index DS-index Rank
cations

A 7 15,10,5,2,1,1,0 34 3 5.66 5.71 2 5 12.68 1


B 7 15,5,2,10,0,1,1 34 3 4.85 5.63 2 5 12.68 1
C 6 15,5,2,1,1,10 34 3 5.66 5.71 2 5 12.68 1
D 6 5,2,10,1,1,15 34 3 5.66 5.71 2 5 12.68 1
E 8 10,5,15,1,1,1,0,0 34 3 5.5 5.85 2 5 12.68 1
A. K. Dey et al.
Distinctive author ranking using DEA indexing

4 Research methodology

The following sections present the solution methodology of the proposed method, which
comprises of the evaluation of the mean and the SD, followed by DEA.

4.1 Standard deviation and mean

The mean and standard deviation (SD) are established statistical metrics for comparative
performance measurement, including academic performance. The relevant citation data
are heterogeneous and increase dynamically. Characteristically, the mean and SD are well
suited for performance ranking. To solve the problems mentioned in Sect. 3, the following
approach may be considered:
When two researchers have the same indexing and ranking, by using mean and SD,
in general cases, an aggregated solution is obtained.
But in practice, it is observed that the mean and SD, both, fail in providing a unique
ranking to the authors based on the citations earned by their research papers, as shown
in Table 3. According to the problem depicted in Table 2, when the mean and SD are the
same, getting a feasible solution is difficult. Hence, there is a need for other metrics.
In the available literature, generally, mean-centric results are provided for citations. But,
in the proposed DEA-based solution, a stochastic approach is used to generate results using
mean and SD.
Figure  1 presents the perceptual graph using the combined mean and combined SD.
After plotting the combined value, it generates one intersection point and four quadrants.
Here, we have taken from the actual data of 20 authors. As the majority of the authors’
mean and SD lies between the negative side of the perceptual graph, hence we can say that
no rank can be derived from only mean and SD. That is the reason we go for DEA. The
perceptual graph depicts that mean, and SD possesses a negative correlation in most of the
cases. Therefore, based on citation mean and SD, we cannot conclude any author’s supe-
riority because the co-relation of both SD and mean is negative or positive. So, there is no
fixed co-relation, and their nature is independent as they are based on observation. When
we refer to the coefficient of variance which is a ratio of SD and mean we find that when
the coefficient of variance is less then author citation is good and when the ratio of posi-
tive semi-variance (citation mean > combine mean) and negative semi-variance (citation
mean < combine mean) is good, author citation is also good. Thus, these are complemen-
tary output.

Table 3  A snapshot of various authors & citation counts generating rank using mean and SD
Author Publications Citations Total citation Mean Rank SD Rank
Mean SD

A 7 15,10,5,2,1,1,0 34 5.66 1 5.71 2


B 7 15,5,2,10,0,1,1 34 4.85 3 5.63 3
C 6 15,5,2,1,1,10 34 5.66 1 5.71 2
D 6 5,2,10,1,1,15 34 5.66 1 5.71 2
E 8 10,5,15,1,1,1,0,0 34 5.5 2 5.85 1

13
A. K. Dey et al.

Fig. 1  The perceptual graph to calculate the correlation between mean & SD

Therefore, it can be concluded that based on one output, one author may be ranked high.
But on the basis of other output, the author gets a lower rank. So, we need a method where
there is no functional relation as well as no probability distribution. An additional param-
eter is required along with the mean and SD value.

4.2 DEA indexing

DEA is a non-parametric and linear programming tool used for calculating the performance
of the same set of entities that acquire some probable functional attributes, which may be
products or services. The DEA approach became prominent due to its ability for measuring
efficiency. It is one of the most powerful quantitative and non-parametric tools for measur-
ing performance. In this research, the input-oriented DEA technique was applied. Input
oriented DEA lessen the inputs providing constant outputs under a variable return-to-scale
(VRS). The VRS creates asymmetric rise or falls in outputs when inputs are increased. As
DMU grows, its efficiency would either increase or decrease. The VRS technique is used
in this research because the citations of the authors are not stable (Raab and Lichty 2002).
The efficiency score in the DEA model varies from 0 to 1. One is the highest score to
indicate maximum efficiency, and any score of less than one denotes an entity’s relative
inefficiency. The analysis of efficiency serves either of the following views: to produce a
greater quantity of outputs with the same level of inputs or to use fewer levels of inputs for
producing the same amount of outputs (Guha et al. 2014).
As the efficiency score varies from 0 to 1, a problem arises when an efficiency score of
1 appears in a case of more than one DMU. Then, the complete ranking of all the DMUs
becomes impossible. Andersen and Petersen (1993) addressed this problem and proposed a
super-efficiency model. Many researchers have come up with different concepts and formu-
lations for super-efficiency analysis.
In this study, first, the focus was on the calculation of mean and SD from the citations
of each author and treating them as inputs. Because the mean values of any author may
indicate their average performance and generally give an idea about the performance, SD
may help for measuring the performance of each author over the time period. Using the
mean value of each author, positive and negative covariances were evaluated. Then, the
SOS (sum of squares) of positive values and negative values of each year were ascertained.

13
Distinctive author ranking using DEA indexing

Next, the SD of positive deviation, as well as negative deviation, was obtained. Finally, R1
and R2 were evaluated, which, in this research, were treated as output.
But, with some of the DMUs, or authors, identical scores were obtained, as shown in
Table 3, which would create an issue for correctly ranking DMUs according to efficiency.
Hence, the super-efficiency analysis was used in this research, which excludes the efficient
DMUs and compares the analysed DMU with the linear combinations of other DMUs.
This study proposes that the rank ambiguity problem can be solved using DEA-index
(El-Mahgary and Lahdelma 1995) mechanism proposed by Charnes et al. (1978), which is
generally used to extract the unique rank from the various DMUs (decision-making units)
(Raab and Lichty 2002). In our study, the same mechanism is followed, where DMUs rep-
resent authors and the unique rank derived for each author using DEA.
Using efficiency and super-efficiency is proposed to support the mean and SD. To derive
the efficiency and super-efficiency (in case of the same rank), the DEA technique is used,
which takes mean and SD as inputs and evaluates positive covariance, and negative covari-
ance along with SD/mean as output. The process is depicted in Fig. 2.
Considering the DEA and super-efficiency results, the DMUs were ranked accordingly.
Input oriented DEA-VRS model with a single input and output could be solved using Eq. 1,
but in the case of unknown input and output transformations, the relative efficiency was
measured using Eq. 2.
Input
Efficiency = (1)
Output


Weighted Input
Relative Efficency = ∑ (2)
Weighted Output

The input-oriented VRS technique requires the solution of the following linear program-
ming problem, proposed by Charnes et al. (1978), given in Eq. 3.

Fig. 2  Model to determine rank using DEA

13
A. K. Dey et al.

Min 𝜃
Subject to
n
j
wj xi ≤ 𝜃xit ;i = 1, 2, 3 … m

j=1
n
∑ j
wj yr ≤ ;ytr ;r = 1, 2, 3 … s (3)
j=1
n

wj = 1;
j=1
wj ≥ 0(j = 1, 2, 3, … , n);

If it is considered an efficient unit having θ equal to 1(one), then super-efficiency is rep-


resented as Eq. 4 (where j ≠ t ) (El-Mahgary and Lahdelma 1995).
n
j

wj xi ≤ 𝜃xit ; i = 1, 2, 3 … m
j=1
n

wj yjr ≤ ytr ; r = 1, 2, 3 … s
j=1 (4)
n

wj = 1;
j=1

wj ≥ 0(j = 1, 2, 3, … , n);

The algorithm for evaluating the unique rank of authors is presented in Algorithm  1.
Algorithm 1 represents the method for author ranking using DEA, described in Sect. 4.2.
The pseudocode depicts the process of calculation Input & Output from the set of data.
Mean and SD has been calculated from the yearly citation. The output has been calculated
using positive covariance (psv), negative covariance (nsv), and the ratio of SD and mean.
Using the above analysis on the data in Table 1, the derived DEA ranking is depicted in
Table 4.

13
Table 4  Ranking using DEA-index and its comparison with other ranking methods
Distinctive author ranking using DEA indexing

Author Publications Citations Total citation h-index i10-index g-index DS-index Rank Rank Rank DEA rank
Mean SD

A 7 15,10,5,2,1,1,0 34 3 2 5 12.68 1 1 2 2
B 7 15,5,2,10,0,1,1 34 3 2 5 12.68 1 3 3 1
C 6 15,5,2,1,1,10 34 3 2 5 12.68 1 1 2 3
D 6 5,2,10,1,1,15 34 3 2 5 12.68 1 1 2 4
E 8 10,5,15,1,1,1,0,0 34 3 2 5 12.68 1 2 1 5

13
A. K. Dey et al.

5 Experiment, results, and analysis

5.1 Data collection

The massive publication citation data (Guide2Research 2019) (Cornell University 2019a,
b) was extracted from Google Scholar with the help of a Web Crawler. Attempts were made
to gather only relevant data from the target website by following a predefined sequence. As
one web page contains URLs of other web pages, these URLs were retrieved from the cur-
rent page, and all of these affiliated URLs were added to the crawling queue. Then, the next
page was crawled, and the same process was repeated recursively. Websites were crawled
for gathering data as long as the Internet could be accessed, and the web page analysed.
The crawler supports exporting extracted data into structured formats, such as CSV and
Excel. Figure 3 lists the steps for the data collection procedure.
The collected dataset contains information on various publications from different fields
of the Computer Science domain, published from 1978 to 2017. The dataset includes paper
title, its author(s), the publication year, field of the paper, year-wise citation list along with
other information. A random exponential back-off time was used whenever the server
returned some error, and the request was sent again. The robot restrictions imposed by the
servers were followed to ensure efficient crawling of data. It took around six weeks to com-
pletely gather all the information related to 200,000 papers. All the inconsistencies that
were prevalent in the data were removed, and all such papers that did not have complete
attributes (e.g., index of the paper, the year of publication, etc.) required for our study were
filtered out, the detailed data statistics have been mentioned in Table 5.

13
Distinctive author ranking using DEA indexing

Fig. 3  The workflow of the data collection procedure

Table 5  The statistics of the dataset


Attribute Description

Domain Computer Science and Electronics


Publication years 1978–2017
Number of authors 1000 (Top 1000 according to h-index and citation)
Filtered authors area of interest Machine learning, artificial intelligence, data
analytics and mining
Final experimental DATA​ Top 20 authors according to h-index, i10 index
Final experimental publication years 2007–2017

5.2 Experiment and results

According to Raab and Lichty (2002), the number of minimum DMUs needs to be equal to
the maximum number of multiplications of input and output, or three times the summation
of input and outputs. So, according to this criterion of DMC selection, DMU of size 20 was
taken, and the dataset was prepared for the experiment. Table 6 presents the ranking of the
20 authors using existing indexing methods.
As per DEA requirements, for the experiment, two inputs and two outputs were con-
sidered, as shown in Table  7. The inputs & outputs were calculated for all authors and

13
A. K. Dey et al.

Table 6  Ranking using available indexing methods (without DEA)


Authors Abbreviation h-index rank i10-index rank DS rank G rank

Anil K. Jain AKJ 1 1 1 5


Michael I. Jordan MJ 2 4 6 5
Geoffrey Hinton GH 3 14 3 3
Tomaso Poggio TP 4 8 9 1
Georgios B. Giannakis GG 5 2 13 6
Vapnik VA 6 6 2 6
Daphne Koller DK 6 17 14 9
Luc Van Gool LG 7 3 12 10
Trevor Hastie TH 8 16 4 7
Bernhard BH 9 9 7 11
Jitendra Malik JM 10 19 5 8
Kalyanmoy Deb KD 11 7 8 7
YoshuaBengio YB 12 10 10 11
Alex Smola AS 13 18 11 11
Shih-Fu Chang SC 13 5 18 8
YannLeCun YL 14 20 15 4
Klaus-Robert Mauller KM 14 15 16 11
Richard Baraniuk RB 15 12 17 8
Alan S. Willsky AW 16 11 19 2
Wei-Ying Ma WM 16 13 20 10

Table 7  Considered inputs Input Output


and expected output of our
experiment
Mean R1 = SD based on positive deviation from
mean/SD based on negative deviation from
mean
SD R2 = Coefficient variation = SD/mean

tabulated in Table 8, where, as per the requirement of the proposed DEA solution, the input
and output have been taken.
The VRS-based method provided the same rank to some authors. To ensure uniqueness
in this method, super-efficiency in the forms of Constant Return Scale (CRS) was initiated,
which, by removing the tie, gave a unique rank to each author, as shown in Table 9 and
depicted in Fig. 4.

6 Discussion

The ranks generated by all indices are presented in Table 10. Figure 5 depicts the com-
parison of all four indices to the degree of rank distinctiveness for authors with the pro-
posed DEA-index. Ranks are assigned according to the ascending order of index values.
The author with maximum calculated index value will be given to the first rank, and so
on, while the authors with identical index values will be assigned the same rank. It is

13
Distinctive author ranking using DEA indexing

Table 8  Experimental dataset Authors Input 1 Input 2 Output 1 Output 2


showing the inputs & outputs
AKJ 11466.272 1612.840 4.564 0.140
MJ 9326.454 2876.639 1.978 0.308
GH 11500.181 9967.673 2.536 0.866
TP 4717.545 346.877 6.758 0.073
GG 4231.363 290.444 4.505 0.068
VA 15603.545 2041.548 4.772 0.130
DK 4629.272 1089.374 3.266 0.235
LG 6618.181 2845.795 1.208 0.429
TH 12391.727 3891.752 2.341 0.314
BH 8176.545 1438.142 3.049 0.175
JM 9452.363 2141.499 2.852 0.226
KD 8080.454 1898.372 2.171 0.234
YB 7729.454 10081.317 4.146 1.304
AS 6767.000 1548.452 2.824 0.228
SC 2648.818 469.251 8.006 0.177
YL 4740.636 5009.427 3.196 1.056
KM 3892.363 826.322 2.958 0.212
RB 3689.545 1251.300 1.362 0.339
AW 2383.545 349.662 6.445 0.146
WM 2849.272 481.919 2.956 0.169

Table 9  Ranking information used in DEA-index with super-efficiency calculations


Authors VRS CRS SE (Scale Efficiency) SUP (CRS) DEA-rank

AKJ 0.215143 0.207889 0.966285199 8


MJ 0.382964 0.366541 0.957116312 16
GH 0.402657 0.398266 0.989096168 7
TP 1 1 1 1.056983 4
GG 1 0.812572 0.812571700 19
VA 0.16763 0.152503 0.909761922 18
DK 0.687775 0.65812 0.956882973 17
LG 0.605466 0.580587 0.958910148 13
TH 0.290515 0.278169 0.957500689 15
BH 0.321436 0.321408 0.999915069 5
JM 0.329581 0.316342 0.959832903 11
KD 0.392134 0.375767 0.958261217 14
YB 1 0.763763 0.763762500 20
AS 0.462043 0.44343 0.959714797 12
SC 1 1 1 1.11780 2
YL 1 1 1 1.495015 1
KM 0.772353 0.745575 0.965329320 9
RB 1 0.961688 0.961688300 10
AW 1 1 1 1.106971 3
WM 0.909674 0.907539 0.997652676 6

13
A. K. Dey et al.

Fig. 4  Model of rank tie-break using DEA

Table 10  Ranking information Authors h index rank i10-index rank DS rank G rank DEA-rank
used in DEA-index with super-
efficiency calculations
AKJ 1 1 1 5 8
MJ 2 4 6 5 16
GH 3 14 3 3 7
TP 4 8 9 1 4
GG 5 2 13 6 19
VA 6 6 2 6 18
DK 6 17 14 9 17
LG 7 3 12 10 13
TH 8 16 4 7 15
BH 9 9 7 11 5
JM 10 19 5 8 11
KD 11 7 8 7 14
YB 12 10 10 11 20
AS 13 18 11 11 12
SC 13 5 18 8 2
YL 14 20 15 4 1
KM 14 15 16 11 9
RB 15 12 17 8 10
AW 16 11 19 2 3
WM 16 13 20 10 6

seen that the corresponding h, i10, DS and G index ranks of the author who ranked first
in the proposed DEA index are 14, 20, 15 and 4 respectively. Updation of authors rank
in the existing indices need one or more citations on an average. However, even a single
citation can influence the ranking using the proposed DEA indexing.

13
Distinctive author ranking using DEA indexing

Fig. 5  A comparison of indices to the degree of rank distinctiveness for authors using the proposed DEA-
index

Also, it can be seen from Table 10 that the DEA-index is quite close to the g-index. This
index is defined as: a group of publications has a value g if this g possesses the topmost rank
in a manner that the highest g publications have a minimum of g2 citations together. It also
implies that the highest (g + 1) publications possess lesser than (g + 1)2 publications (Egghe
2006).
However, the h-index and g-index have certain disadvantages. The h-index considers the
duration of the career of a researcher, and hence newcomers are at a disadvantage. This does
not realistically evaluate the achievement of the researchers who have begun publishing at
least ten years back. It cannot make clear distinctions between researchers who are inactive
or active. The g-index has narrow discriminatory abilities since it is evaluated as an integer.
Thus, many authors would get similar g-index values even when their numbers of citations
vary considerably, making it problematic to compare performances (Ahmed AbdulHassan
2015).
The DEA-index is compared with the other existing performance measurement indices
using Spearman and Pearson correlation coefficients.

6.1 Spearman correlation

The Spearman Correlation is used to evaluate the linear association of two variables. In this
paper, it shows the association between the ranking orders of the proposed and the existing
indices. It is calculated using Eq. 5, and the results are shown in Table 11.
∑ ∑ ∑
n( R1 R2 ) − ( R1 )( R2 )
r= �
(5)
[k R21 − ( R1 )2 ][k R22 − ( R2 )2 ]
∑ ∑ ∑ ∑

13
A. K. Dey et al.

Table 11  Correlation derived Index h i10 DS G DEA


from the result set using the
Spearman Correlation method
H 1.000 0.472 0.758 0.384 − 0.367
i10 0.472 1.000 0.187 0.200 − 0.188
DS 0.758 0.187 1.000 0.250 − 0.347
G 0.384 0.201 0.245 1.000 0.264
DEA − 0.367 0.092 − 0.040 0.500 1.000

6.2 Pearson correlation

The Pearson correlation is used to evaluate the linear association of two variables. In this
paper, it shows the association between the ranking orders of the existing indices and the pro-
posed index. The symbols R1 and R2 represent the lists of two ranking results under discus-
sion. It is calculated using Eq. 6, and the results are shown in Table 12.
∑ � �∑
( (R1 R2 ) − R1 ( R2
r= √ (6)
[k]

6.3 Kendall correlation

Kendall Rank Correlation Coefficient is used to measure the variations in two results (Farooq
et al. 2017). Here, the Kendall coefficient is calculated using Eq. 7, and the results are shown
in Table 13.
(number of concordant pairs) − (number of discordant pairs)
t= 1∕2k(k − 1) (7)

Table 12  Correlation derived Index h i10 DS G DEA


from the result set using Pearson
Correlation method
h 1 0.476 0.744 0.379 − 0.340
i10 0.476 1 0.188 0.164 − 0.188
DS 0.744 0.188 1 0.209 − 0.347
G 0.379 0.164 0.209 1 0.318
DEA − 0.340 − 0.188 − 0.347 0.318 1

Table 13  Correlation derived Index h i10 DS G DEA


from the result set using Kendall
Correlation method
h 1 0.3404 0.6171 0.3141  − 0.2873
i10 0.3404 1 0.1368 0.1581 − 0.0842
DS 0.6171 0.1368 1 0.1581 − 0.2316
G 0.3141 0.1581 0.1581 1 0.1472
DEA − 0.2873 − 0.0842 − 0.2316 0.1472 1

13
Distinctive author ranking using DEA indexing

The results indicate that in some cases, a negative correlation is generated because of
the inconsistent citation years while considering the DMUs. Subsequent and average cita-
tions are not considered in h, i10 and DS indices, and they are inversely correlated with the
number of citation. That is why the correlation for these indices are negative. However,
in G index, the number of average citations is considered and only the h-core value but
not all the publications; hence it is influenced by the most successful papers only whereas
DEA index considers each publication and each citation, which helps in generating unique
ranking.

7 Conclusions

The academic index is a metric to evaluate the performance of a researcher, which is gen-
erally based on the author’s/researcher’s publications’ citations. The existing indexing
schemes such as h index, DS index, G index, and i10 index fail to rank authors distinc-
tively; however, consideration of total citations of all the scholarly articles of an author
is more rationale approach for measuring the research impactor researcher. The existing
approach available in the literature does not consider the variable citation patterns of the
researcher to evaluate their research impact. The existing indexing scheme also unable to
update the rank of an author unless a set of papers gets the required number of citations.
We attempt to address the limitation of the existing approaches using the proposed DEA-
index. Also, in the existing indices schemes, it requires a sufficient number of citations for
a change in the rank, but in case of DEA index, even a single citation can influence the
ranking.
The proposed method, when applied to a real-life dataset comprising citation informa-
tion of the top 20 authors (among 1000 authors). The proposed method guarantees to pro-
vide unique ranking to each author and able to update the rank of an author, even in the
case a single increase in the citation count. The primary ranking is done using a stochas-
tic approach using mean and SD. The concept of super-efficiency is used to break the tie
between authors. The DEA-based ranking is compared with  other existing indices using
Spearman and Pearson correlation coefficients.

Acknowledgements  Not Applicable.

Funding  Not Applicable.

Data availability  The datasets used during the study are available from the corresponding author on reason-
able request.

References
Abrishami, A., Aliakbary, S.: Predicting citation counts based on deep neural network learning techniques.
J. Informetr. 13(2), 485–499 (2019)
Ahmed AbdulHassan, A.-F.: Development G-Index and H-Index: Dgh-Index. Comput. Eng. Intell. Syst.
6(10), 22–35 (2015)
Andersen, P., Petersen, N.C.: A procedure for ranking units in data envelopment analysis. Manag. Sci. 39,
1261–1264 (1993)
Bar-Ilan, J.: Which h-index? A comparison of WoS, Scopus and Google Scholar. Scientometrics 74(2), 257–
271 (2008)

13
A. K. Dey et al.

BiHui, J., LiMing, L., Rousseau, R., Egghe, L.: The R- and AR-indices: complementing the h-index. Chin.
Sci. Bull. 52(6), 855–863 (2007)
Bornmann, L., Daniel, H.-D.: The state of h index research. Is the h index the ideal way to measure research
performance? EMBO Rep. 10, 2–6 (2009)
Brito, R., Rodríguez-Navarro, A.: Evaluating research and researchers by the journal impact factor: Is it bet-
ter than coin flipping? J. Informetr. 13(2), 314–324 (2019)
Charnes, A., Cooper, W., Rhodes, E.: Measuring the efficiency of decision making units. European Journal
of Operational Research 2(6), 429–444 (1978)
Chen, K.L., Pei-Ying, Yan, E.: Challenges of measuring software impact through citations: an examination
of the lme4 R package. J. Informetr. 13(2), 449–461 (2019)
Cornell University. (2019, May 1). i10 Index. (Cornell University Library) Retrieved September 26, 2019,
from http://guide​s.libra​ry.corne​ll.edu/c.php?g=32272​&p=20339​3.
Cornell University: Measuring your research impact: Getting started. http://guide​s.libra​ry.corne​ll.edu/c.
php?g=32272​andp=20339​3 (2019). Accessed 15 Sep 2018
Costas, R., Bordons, M.: The h-index: advantages, limitations and its relation with other bibliometric indi-
cators at the micro level. J. Informetr. 1(3), 193–203 (2007)
Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006)
El-Mahgary, S., Lahdelma, R.: Data envelopment analysis: visualizing the results. Eur. J. Oper. Res. 83(3),
700–710 (1995)
Farooq, M., Khan, H.U., Iqbal, S., Munir, E.U., Shahzad, A.: DS-Index: Ranking authors distinctively in an
academic network. IEEE Access 5, 19588–19596 (2017)
Giuffrida, C., Abramo, G., D’Angelo, C.A.: Are all citations worth the same? Valuing citations by the value
of the citing items. J. Informetr. 13(2), 500–514 (2019)
Glänzel, W., Persson, O.: H-index for Price medalists. ISSI Newsl. 1(4), 15–18 (2005)
Guha, B., Bandyopadhyay, G., Upadhyay, A.: Efficiency ranking of Indian Oil Companies (DMUs) using
DEA techniques. In: International Conference On Business And Information Management (ICBIM),
pp. 113–118. IEEE, Durgapur (2014)
Guide2Research: Top 1000 authors in computer science. (Guide 2 Research) (2019). www.guide​2rese​arch.
com. Accessed 26 Sep 2019
Halevi, G., Moed, H., Bar-Ilan, J.: Suitability of Google Scholar as a source of scientific information and as
a source of data for scientific evaluation-”Review of the Literature. J. Informetr. 11(3), 823–834 (2017)
Hirsch, J.E.: An index to quantify an individual’s scientific research. Proc. Natl. Acad. Sci. U. S. A. 102(46),
16569–16572 (2005)
Kousha, K., Thelwall, M.: Can Google Scholar and Mendeley help to assess the scholarly impacts of disser-
tations? J. Informetr. 13(2), 467–484 (2019)
Kretschmer, H., Rousseau, R.: Author inflation leads to a breakdown of Lotka’s law. J. Assoc. Inf. Sci. Tech-
nol. 52(8), 610–614 (2001)
Li, R., Li, H., Liu, B.-W. (2015). Decipher the hidden secrets behind networks: Analyze the influence based
on the co-author network. In: International Conference on Computer Science and Applications (CSA).
Wuhan, China
Lotka, A.J.: The frequency distribution of scientific productivity. Journal of the Washington Academy of
Sciences 16(12), 317–324 (1926)
Pao, M.L.: An empirical examination of Lotka’s law. Journal of the American Society for Information Sci-
ence 37(1), 26–33 (1986)
Park, H., Wolfram, D.: Research software citation in the Data Citation Index: current practices and implica-
tions for research software sharing and reuse. J. Informetr. 13(2), 574–582 (2019)
Raab, R., Lichty, R.: Identifying subareas that comprise a greater metropolitan area: the criterion of county
relative efficiency. J. Region. Sci. 42, 579–594 (2002)
Shen, S., Zhu, D., Rousseau, R., Su, X., Wang, D.: A refined method for computing bibliographic coupling
strengths. J. Informetr. 13(2), 605–615 (2019)
Waltman, L., van Eck, N.J.: The inconsistency of the H-index. J. Assoc. Inf. Sci. Technol. 63(2), 406–415
(2012)
Zuo, Z., Zhao, K., Ni, C.: Standing on the shoulders of giants? Faculty hiring in information schools. J.
Informetr. 13(2), 341–353 (2019)

Publisher’s Note  Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

13

You might also like