You are on page 1of 14

Fuzzy vs.

Likert Scale in Statistics

Marı́a Ángeles Gil and Gil González-Rodrı́guez

Abstract. Likert scales or associated codings are often used in connection with
opinions/valuations/ratings, and especially with questionnaires with a pre-specified
response format. A guideline to design questionnaires allowing free fuzzy-numbered
response format is now given, the fuzzy numbers scale being very rich and expres-
sive and enabling to describe in a friendly way the usual answers in this context. A
review of some techniques for the statistical analysis of the obtained responses is
enclosed and a real-life example is used to illustrate the application.

1 Introduction
Likert scales are widely used to measure attributes often associated with opin-
ions/valuations/ratings, and so on, leading to ordinal/categorical data from a set of
pre-fixed labels/categories/names.
To facilitate the development of statistical data analysis in this setting, the usual
way to proceed is to code each response category by means of an integer number
(often by using the either the scale 1-5, or 1-7). More recently, some authors (see,
for instance, Lalla et al. [9] Lazim and Osman [10], Bharadwaj [2]) have suggested
to identify each Likert response category with a fuzzy subset from a class of opera-
tional and flexible fuzzy sets which have been stated by ‘experts’ either individually
or by consensus.
Marı́a Ángeles Gil · Gil González-Rodrı́guez
University of Oviedo, 33071 Oviedo, Spain
e-mail: {magil,gil}@uniovi.es
Gil González-Rodrı́guez
European Centre for Soft Computing, 33600 Mieres, Spain
e-mail: gil.gonzalez@softcomputing.es
 This paper has been written as a tribute to Professor Ebrahim Mamdani. We have had the
great opportunity of meeting a unique outstanding person, during last years mainly
because of him being a member of the Scientific Committee of the European Centre for
Soft Computing. We have learned a lot from his lectures and conversations, and have
enjoyed with the fruitful discussions around, so we will feel always indebted to him.

E. Trillas et al. (Eds.): Combining Experimentation and Theory, STUDFUZZ 271, pp. 407–420.
springerlink.com 
c Springer-Verlag Berlin Heidelberg 2012
408 M.Á. Gil and G. González-Rodrı́guez

One of the key concerns for these approaches relates to the fact that the number
of different potential values of the attribute is small, whence many statistical devel-
opments to be performed could be limited or even unfeasible, and conclusions from
them could be sometimes not very accurate.
In this respect, an alternate way to proceed (see, for instance, van Laerhoven et al.
[8]) is the one corresponding to the so-called simple visual analogue scale (or line
response option), in which the extreme answers of the Likert scale mark the ends of
a line and respondents are asked to mark the line with a cross somewhere between
both extremes that best reflect their answer. Therefore, there is no pre-specified list
of possible answers but questionnaire is a free format one.
Another more general alternate approach to proceed is to be explained, the ap-
proach taking into account that, for the sake of realism, the nature of most of the at-
tributes concerning opinions/ratings/judgements involve subjectiveness and a certain
imprecision. In this way, the value of such an attribute for an individual is assumed
to be described by using a fuzzy set (usually a fuzzy number) fitting the perception
of the researcher/respondent without considering a pre-fixed list of answers. This
freedom in assigning values leads to a free (fuzzy-valued) response format enabling
a variability and accuracy which would not be captured in case of using either a
Likert scale or an associated real- or fuzzy-valued coding.
Whereas Likert scales or associated codings discretize concerned attributes into
a small number of potential values, the use of the free response format would allow
attributes to take either a large finite or infinite number of potential values. The
spirit of Statistics, as the science of variation, randomness and chance, would be
better captured by using this free response format than a Likert-like (or a coded
Likert-like) one.
Furthermore, the fuzzy scale is rich and expressive enough to find a value in it
fitting appropriately the valuation/opinion/rating involving subjective perceptions in
most of real-life situations, even if we constrain ourselves to find it in some oper-
ational classes of fuzzy sets, like trapezoidal, S- and Π -curves (see Eshragh and
Mamdani [3]).
On the other hand, to facilitate statistical data analysis Likert response cate-
gories are usually coded by consecutive integer values. This assignment has been
frequently criticized as unrealistic (cf. Wu [22]) because the integer numbers cannot
reflect often real differences between scale categories.
In this paper, after presenting the preliminaries on the fuzzy scale which will be
most commonly used, a guideline is given to design questionnaires allowing free
fuzzy-numbered response format, and to explain non-expert users how to employ
this friendly and accurate approach.
Some real-life examples will illustrate the approach and a review will be given on
a methodology which is being carried out to analyze fuzzy data in this setting. This
methodology is based on a versatile and intuitive distance with a meaning similar
to the one for real numbers. As a consequence, by combining the fuzzy scale with
this distance the usual concerns on the integer coding of Likert scale categories
are avoided, since distance between fuzzy numbers reflect properly real differences
between the corresponding perceptions.
Fuzzy vs. Likert Scale in Statistics 409

2 Preliminaries on Fuzzy Numbers and Motivating Example


In real-life we can find different valuations/ratings/perceptions/classifications asso-
ciated with random experiments which lead to data which cannot be properly ex-
pressed within the scales of integer or real numbers, but they can be quite suitably
expressed by employing the richer scale of fuzzy numbers.
The space of fuzzy numbers modeling data to be considered in the paper is the
class Fc (R) of normal convex fuzzy sets of R, more precisely, the class of mappings
U : R → [0, 1] so that for each α ∈ (0, 1] the α -level set U
α = {x ∈ R : U(x) ≥

α } is a nonempty compact interval. A fuzzy number U ∈ Fc (R) models an ill-

defined property on (or subset of) R, so that for each x ∈ R the value U(x) is usually
interpreted as the ‘degree of compatibility’ of x with the property ‘defining’ U,  or

‘degree of possibility’ of the assertion “x is U”.
As a real-life example motivating the use of a free fuzzy-numbered response
format the following example is considered

Example 1. Institutions traditionally conduct surveys of students’ opinion about


courses taken at them. In order to know about what students think about courses,
questionnaires are designed to get opinion from students, most of them being based
on Likert-like sets of possible answers.
In this way, many questionnaires designed for this purpose include a set of ques-
tions inquiring about different aspects for each course, and respondents can choose
their judgements/opinions from a list of pre-established answers (for instance, they
can correspond to AGREE STRONGLY or AGREE TO A VERY HIGH DEGREE, TEND
TO AGREE or AGREE TO A CONSIDERABLE DEGREE , NEUTRAL or AGREE TO A
MODERATE DEGREE , TEND TO DISAGREE or AGREE TO A SMALL DEGREE , and
DISAGREE STRONGLY or AGREE HARDLY AT ALL ). These answers are often iden-
tified with/coded by some integer numbers (say 1 to 5) and some statistics are later
performed.
A fuzzy valued-based questionnaire involving a free response format has been
applied on the occasion of the II Summer School of the European Centre for Soft
Computing (held in July 2008 in Mieres, Asturias, Spain). A survey has been carried
out aiming to represent their opinion/valuation about different key aspects of each
of the 9 delivered courses.
After a short explanation of the formalization and meaning of fuzzy sets (actually,
some of the courses taken in the School refer to Fuzzy Set Theory), students have
been requested to reply by using fuzzy numbers in the [0, 100] X-scale (i.e., [0, 100]
has been the general support, with 0% indicating the minimum degree of agreement
and 100% the maximum degree). Figure 1 displays the form to be filled by each of
the students and for each course they have received.
For the sake of easing the drawing and simplifying the subsequent statistical com-
putations, students have been requested to consider trapezoidal fuzzy numbers, al-
though other fuzzy numbers could be considered without increasing very much the
computation complexity.
410 M.Á. Gil and G. González-Rodrı́guez

Q1. Motivation of the course

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Q2. Intellectual challenge provided by the course

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

30% 40% 50% 60% 70% 80% 90% 100%

Q3. Lecturer performance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Q4. Quality of the course material

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Q5. Overall rating

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Fig. 1 Questionnaire applied to students in the II Summer School of the ECSC

Figure 2 shows the answer supplied by one of the students to the question con-
cerning the motivation of a concrete course; this answer would indicate that for this
student the motivation of the course has been not lower than 75% (so, the 0-level
will be [75, 100]), and he/she considers that 80 to 90% are the values being fully
compatible with his/her opinion (so, the 1-level will be [80, 90]), these levels being
finally ‘interpolated’ by using a linear interpolation.

Q1. Motivation of the course

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

0% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Fig. 2 Answer supplied by one of the students of a course about a given question
Fuzzy vs. Likert Scale in Statistics 411

This questionnaire has been designed for individual description. Since no pre-
specified list of categories has been considered in advance, students have a high
freedom to express their valuation/rating accurately. Furthermore, as it has been
pointed out in the Introduction, the variability of collected data are definitely much
better captured in this way. To corroborate this assertion we can comment that, in
particular for the first examined course and question, only 4 coincidences have been
detected among the 29 students who have attended the course.
Broadly speaking, this fuzzy valued-based questionnaire provides investigators
with a richer information than traditional ones, and the freedom in the response
format leads to more interesting and powerful statistics.

3 Guidelines to a Free Fuzzy-Numbered Response Format


Questionnaire
Inspired by the motivating example above, one can design and reply to a ques-
tionnaire involving subjective valuations/opinions/ratings in accordance with a free
fuzzy-numbered response format, by considering a form like the one in Figure 1 and
the following easy to explain and friendly to use hints:
1. Practitioners (researchers/respondents) first state the ‘general support’ as the set
of values which could be potentially compatible with the possible answers, or the
set of values which could be considered as a ‘reference range’ (in Example 1 this
has been assumed to be [0, 100]).
2. Practitioners (researchers/respondents) state the 0-level as the set of values which
can be considered to be compatible with their valuation to a greater or lesser
extent.
3. Practitioners (researchers/respondents state the 1-level as the set of values which
are considered to be fully compatible with their valuation.
4. These two level sets are finally ‘interpolated’ to get a fuzzy value (in the moti-
vating example this interpolation being a linear one.
To ensure a full freedom of practitioners, the suggested hints are not necessarily
constrained to a particular shape for the fuzzy values. Nevertheless, for purposes of
limiting/reducing computation complexity the type of interpolation indicated in (4)
is often chosen from a list of manageable functions (e.g., linear, S-curves, etc.).

4 Distance-Based Statistical Analysis of Fuzzy-Numbered


Responses
To manage fuzzy answers for the subsequent statistical analysis, some key tools
should be considered, namely arithmetic and distance between data, which should
extend those for real and interval-valued data (especial types of fuzzy data) and
should be defined so that they pay attention to the fuzzy meaning.
412 M.Á. Gil and G. González-Rodrı́guez

4.1 Arithmetic and Metric for Fuzzy Numbers


The elementary arithmetic operations required for the statistical analysis of fuzzy
data are the sum and the product by a scalar. These two operations can be ap-
proached either by applying directly Zadeh’s extension principle (Zadeh [23]) or,
equivalently and based on the results by Nguyen [17], as the level-wise extension of
the usual interval-valued arithmetic.
In this way, given two fuzzy numbers U, V  ∈ Fc (R), the sum of U  and V is
 
defined as the fuzzy number U + V such that for each α ∈ [0, 1]:
 
(U +V  )α = y + z : y ∈ U
α , z ∈ Vα ,

and given a real number γ , the product of U by the scalar γ is defined as the fuzzy

number γ · U such that for each α ∈ [0, 1]:
 
 α = γ ·U
(γ · U) α = γ · y : y ∈ U α .

The space (Fc (R), +, ·) has not a linear (but a semilinear-conical) structure, since
 + (−1)· U
U  does not coincide in general with the indicator function of the singleton
{0}, but with a fuzzy number being symmetrical w.r.t. 0.
Let θ ∈ (0, +∞) and let ϕ be an absolutely continuous probability measure on
([0, 1], B[0,1] ) with the mass function being positive in (0, 1) and B[0,1] being the
Borel σ -field on [0, 1]. From now on, and to guarantee the existence of the involved
distances, we will constrain to the wide subclass Fc2 (R) of fuzzy numbers U  for
 
 
which both [0,1] [inf Uα ] d λ (α ) < ∞ and [0,1] [sup Uα ] d λ (α ) < ∞.
2 2
ϕ  V ∈ Fc2 (R)
Then, the mapping D : Fc2 (R) × Fc2 (R) → [0, +∞) such that for U,
θ
 

2
2
ϕ      
Dθ (U, V) = mid Uα − mid Vα + θ · spr Uα − spr Vα d ϕ (α )
[0,1]

defines a metric between fuzzy numbers, the so-called (θ , ϕ )-distance (see


Trutschnig et al. [21]) which allows weighting the importance of different levels,
as well as for each level the relevance of the squared Euclidean distances between
the centers (mids) (which means a kind of deviation in ‘location’) and the squared
Euclidean distances between the radius (spreads) (which means a kind of deviation
in ‘shape’ or ‘imprecision’).
ϕ
In particular, if θ = 1, Dθ is equivalent to weighting only and uniformly the two
squared Euclidean distances between the extreme points of the level sets, so that
 

1
2
2
ϕ  
D1 (U, V ) = α − inf Vα + 1 sup U
inf U α − sup Vα d ϕ (α ),
[0,1] 2 2
ϕ
and if θ = 1/3, Dθ is equivalent to weighting uniformly all the squared Euclidean
distances between the convex linear extreme points of the level sets, so that
Fuzzy vs. Likert Scale in Statistics 413
 

2
ϕ  V ) =
D1/3 (U, Uα[ν ] − V
α[ν ] d λ (ν ) d ϕ (α )
[0,1] [0,1]

with α[ν ]
U = ν sup U α + (1 − ν ) inf U α and λ being the Lebesgue measure in
([0, 1], B[0,1] ).
More generally, if θ ∈ (0, 1], then (see Gil et al. [4], Trutschnig [21]) there ex-
ist a weighting measure

W formalized as a nondegenerate

probability measure on
([0, 1], B[0,1] ) with [0,1] dW (ν ) = .5 and θ = [0,1] (2ν − 1)2 dW (ν ), such that
 

2
ϕ  
Dθ (U, V) = Uα[ν ] − V
α[ν ] dW (ν ) d ϕ (α ),
[0,1] [0,1]

which coincides with Bertoluzza et al.’s metric [1].


In [6], the possibility of identifying Fc2 (R) with a special subset of a functional
Hilbert space has been discussed in depth. This identification has been formalized
by considering an isometrical embedding through the support function of fuzzy
sets, which in the case of fuzzy numbers characterizes each of them by means of
the extremes of their level intervals. The isometry enables not only to establish a
one-to-one correspondence between fuzzy numbers and functions but also between
the fuzzy and functional arithmetics and the metric above and a distance between
functions.
A crucial implication from this fact is that fuzzy answers or data can be treated
as functional data by considering the identification via their support functions, and
many ideas, concepts, results and developments in Functional Data Analysis could
be applied to fuzzy answers/data by using the appropriate identifications and cor-
respondences. Anyway, the translation is not an immediate one, and care should be
taken due to the nonlinearity of the space of fuzzy numbers.
In this respect, to analyze fuzzy answers/data statistically a formal probabilis-
tic framework should be stated. This framework requires a rigorous model for the
mechanism producing these answers/data and definitions for the relevant parame-
ters. For this purpose, we will consider the model and definitions introduced by Puri
and Ralescu which can be also viewed as induced from the functional ones through
the isometrical embedding (see [6]).

4.2 Random Mechanisms Producing Fuzzy Data


In analyzing fuzzy answers/data from a statistical perspective the model formalizing
the underlying random setting and mechanism producing them should be given.
Random fuzzy sets (often referred to in the literature as fuzzy random variables in
Puri and Ralescu’s sense) fit many real-life attributes and classification/qualification
processes associated with valuations/ratings/opinions leading to data which can be
suitably described by means of fuzzy values.
414 M.Á. Gil and G. González-Rodrı́guez

Random fuzzy sets (for short RFS) were introduced by Puri and Ralescu [19], as
a mathematical model for mechanisms associating a fuzzy value with each exper-
imental outcome and extending random variables and sets. The notion of random
fuzzy set (in the 1-dimensional case) can be introduced in some equivalent ways,
namely,

Definition 1. Given a probability space (Ω , A , P), a mapping X : Ω → Fc2 (R) is


said to be a random fuzzy set (RFS for short) if, and only if, any of the following
conditions is satisfied:
• X is a fuzzy random variable in Puri and Ralescu’s sense, that is, for all α ∈
(0, 1] the α -level set-valued mapping
Xα : Ω → Kc (R) = {nonempty compact intervals}, ω → (X (ω ))α ,
is a compact random interval (that is, Xα is a Borel measurable mapping w.r.t.
A and the Borel σ -field generated by the topology induced by Hausdorff metric
on Kc (R)).
• X is a Borel measurable mapping w.r.t. A and the Borel σ -field generated by
ϕ
the topology induced by the metric Dθ on Fc2 (R).
• For all α ∈ (0, 1] the real-valued functions
mid Xα : Ω → R, spr Xα : Ω → [0, +∞)
are real-valued random variables.

It should be pointed out that the Borel measurability of RFS’s ensures that one can
properly refer in this setting to notions like the distribution induced by an RFS, the
stochastic independence of RFS’s, and other ones which are required in the statis-
tical developments. As a consequence most of the key ideas in statistical develop-
ments could be preserved.
In the statistical analysis of fuzzy data two main types of summary measures/
parameters may be distinguished:
• fuzzy-valued summary measures, like the mean value of an RFS as a measure for
the central tendency of its values;
• real-valued summary measures, like the Fréchet-variance of an RFS as measures
for the mean error/dispersion of the values of the RFS.
The mean value of an RFS can be presented in two equivalent ways, either as
an extension of the set-valued Aumann expectation (see Puri and Ralescu [19]) or
level-wise in terms of the mids and spreads (as well as induced from the expectation
of a Hilbert space-valued random element). Thus,

Definition 2. Given a probability space (Ω , A , P) and an associated RFS X such


that max{| inf Xα |, | sup Xα |} ∈ L1 (Ω , A , P) for all α ∈ (0, 1], the (Aumann type)
mean value of X is the fuzzy number E(X  ) such that for all α ∈ (0, 1] satisfies
any of the following equivalent conditions

• E(X ) = Aumann integral of Xα = [E(inf Xα ), E(inf Xα )]
α
• mid (E(X ))α = E(mid Xα ), spr (E(X ))α = E(spr Xα ).
Fuzzy vs. Likert Scale in Statistics 415

The mean value of an RFS satisfies the usual properties of linearity. Thus,
Proposition 1. If γ ∈ R, U ∈ Fc2 (R) and X , Y are RFSs associated with (Ω , A , P)
and such that max{| inf Xα |, | sup Xα |}, max{| inf Yα |, | sup Yα |} ∈ L1 (Ω , A , P) for
all α ∈ (0, 1], then
 γ · X + U)
i) E(  = γ · E(X
 
) + U.

ii) E(X 
+ Y ) = E(X  ).
) + E(Y
Furthermore, the mean value of an RFS is coherent with the fuzzy arithmetic and it
ϕ
is the Fréchet’s expectation w.r.t. Dθ , which corroborates the fact that it is a central
tendency measure. Thus,
Proposition 2. The mean value of an RFS satisfies that
i) if X is an RFS associated with the same probability space (Ω , A , P) and
such that the set of the RFS values is finite or countable, that is, X (Ω )
= {x1 , . . . , xm , . . .} ⊂ Fc2 (R), then

E(X ) = P ({ω ∈ Ω : X (ω ) = x1 }) · x1 + . . .
+P({ω ∈ Ω : X (ω ) = xm }) · xm + . . .;
ϕ
ii) it is the fuzzy number leading to the lowest mean squared Dθ -distance (or error)
w.r.t. the RFS values, i.e.,

2 
2
ϕ  ϕ 
E Dθ (X , E(X )) = min E Dθ (X , U) .

U∈F c (R)
2

On the other hand, in formalizing the variance of an RFS the Fréchet’s approach
has been considered (see Lubiano et al. [12], Körner and Näther [14], Ramos et
al. [20]). In this approach the variance is conceived as a measure of the ‘error’ in
approximating the values of the RFS through the corresponding mean value, this
error being quantified in terms of a squared metric. In this way,
Definition 3. Given a probability space (Ω , A , P) and an associated RFS X such
that max{| inf Xα |, | sup Xα |} ∈ L2 (Ω , A , P) for all α ∈ (0, 1], the (θ , ϕ )-Fréchet
variance of X is the real number given by any of the following statements

2
ϕ 
• σX 2 =E Dθ X , E(X )] ,
• σX
2
= Var(mid X ) + θ Var(spr X ).
The (θ , ϕ )-Fréchet variance of an RFS satisfies the usual properties for this concept.
In this way,
Proposition 3. σX 2
≥ 0 with σX 2  ∈ Fc2 (R) such
= 0 if, and only if, there exists U
that almost surely X = U. 
Proposition 4. If γ ∈ R, U  ∈ Fc2 (R) and X , Y are two independent RFSs associ-
ated with the probability space (Ω , A , P) and such that max{| inf Xα |, | sup Xα |},
max{| inf Yα |, | sup Yα |} ∈ L2 (Ω , A , P), then
i) σγ2·X +U = γ 2 · σX
2 .

ii) σX
2
+Y = σX + σY .
2 2
416 M.Á. Gil and G. González-Rodrı́guez

4.3 A Statistical Methodology to Deal with Fuzzy Data


In performing statistical analysis with fuzzy answers/data there are some key dis-
tinctive features to be pointed out, namely,
• The lack of linearity for (Fc (R), +, ·), which prevents to consider a general ex-
tension of the difference of fuzzy numbers preserving its key features.
• The lack of realistic and operational ‘parametric’ families of probability distri-
bution models for RFSs.
• The lack of Central Limit Theorems for RFSs directly applicable for inferential
purposes; although there exist some of these results for RFSs in accordance with
which the normalized distance sample-population fuzzy mean converges in law
to the norm of a Gaussian random element but taking on values which are often
out of the cone of fuzzy numbers.
To avoid these inconvenients crucial roles are played by the use of the distance
ϕ
Dθ and the existence of Central Limit Theorems for Hilbert space-valued random
elements and bootstrapped CLTs (see Giné and Zinn [5]).
As for the case of real-valued data, the aim of Statistics with fuzzy-numbered
ones is (from an inferential position) to draw conclusions about the distribution
of RFSs over populations, on the basis of the information supplied by samples of
observations from them.
In the last decade several developments have been carried out by using the ideas
in Subsection 4.2. More concretely, a statistical methodology based on the concept
of RFSs and the generalized metric in Subsection 4.1 has been introduced. This sta-
tistical methodology relates, on one hand, to discuss the role and suitable properties
of the sample fuzzy mean as a fuzzy-valued estimator of the population one (see, for
instance, Lubiano and Gil [11]).
On the other hand, this methodology has been mostly devoted to testing about
the means of RFSs. Tests about the fuzzy means of RFSs have been formalized for

– the one-sample case (see, for instance, [16] and [7]),


– the two-sample case for independent and dependent samples (see, for instance,
[15]),
– and the k-sample case (i.e., the ANOVA test) for independent and dependent sam-
ples (see, for instance, [6]).
Some inferences have been also stated in connection with variances and other real-
valued ‘parameters’ of the distribution of RFSs, fuzzy arithmetic-based linear regres-
sion models for RFSs, classification analysis of fuzzy data, and some other statistical
studies. There can be found referenced in the webpage of the SMIRE research group
http://bellman.ciencias.uniovi.es/SMIRE/Publications.html.
As an illustration of such a methodology, we are now going to explain in a certain
depth the problem and suggested method of testing about the fuzzy mean in the one-
sample case. By assuming that the available sample information is a realization from
Fuzzy vs. Likert Scale in Statistics 417

a simple random sample (i.e., independent RFSs being identically distributed as the
one to be analyzed), (X1 , . . . , Xn ) from the RFS X , methods have been suggested
to test the null ‘two-sided’ hypothesis

H0 : E(X  ∈ Fc2 (R)
) =U (equality of fuzzy numbers),

which is equivalent to
ϕ  
 =0
H0 : Dθ E(X ), U (equality of real numbers).

An exact test for ‘normal’ RFSs (in Puri and Ralescu’s sense [18]) has been devel-
oped (Montenegro et al. [16]). Although it is an exact and easy-to-apply method, X
being normal in Puri and Ralescu’s sense (i.e., X = V  + N (0, 1) with V
 ∈ Fc2 (R))
is quite restrictive and unrealistic.
On the other hand, asymptotic tests for general RFSs have been also introduced
(see Körner [13], Montenegro et al. [16]). Although it is a general method based
on the Central Limit Theorem for Banach space-valued random elements, and it
is rather easy-to-apply when X takes on a finite number of different values, the
asymptotic distribution of the statistic usually involves unknown parameters, and
large sample sizes are required. Moreover, simulation studies have shown that esti-
mating either the eigenvalues or the covariance operator entails a substantial loss of
precision w.r.t. the nominal significance level.
ϕ
By taking into account these concerns, the use of Dθ has been combined with that
of the Generalized Bootstrapped Central Limit Theorem by Giné and Zinn, allowing
us to consider bootstrap techniques in this context. Thus, in González-Rodrı́guez et
al. [7] a bootstrap approximation to the asymptotic test has been presented. The
algorithm summarizing the steps to be followed to apply such a test is the following
one:

Algorithm for the one-sample bootstrap test of the null hypothesis



H0 : E(X )=U 
x1 , . . . , xn )
Assume that the realization of the simple random sample is given by (
S1. Compute the value of the statistic Tn
  2 
ϕ 1
Tn (sample) = Dθ · [ 
x1 + . . . + xn ] , U Sn2 (sample)
n

where
n   2 
ϕ 1
Sn2 (sample) = ∑ Dθ xi , · [
x1 + . . . + xn ] (n − 1)
i=1 n
S2. Fix the bootstrap population to be the above realization of the simple random
sample
S3. Obtain a realization of the simple random sample (X1∗ , . . . , Xn∗ ) from the boot-
strap population
418 M.Á. Gil and G. González-Rodrı́guez

S4. Compute the value of the bootstrap statistic

Tn∗ (bootstrap sample)



2
ϕ 1
Dθ X ∗ n (bootstrap sample), · [ x1 + . . . + xn ]
= n
Sn∗ 2 (bootstrap sample)

S5. Steps S3 and S4 should be repeated a large number B of times to get a set of B
∗(1) ∗(B)
estimates, denoted by {Tn , . . . , Tn }
∗(1) ∗(B)
S6. Compute the bootstrap p-value as the proportion of values in {Tn , . . . , Tn }
being greater than Tn (sample)

Comparative simulation studies have been carried out, showing that for small/
medium samples, the bootstrap method performs and behaves usually much better
than the asymptotic one, and for large sample sizes (over 300), the improvement is
not that remarkable, but the bootstrap approach still provides the best approximation
to the nominal significance level. It should be also emphasized that the probability of
rejecting the null hypothesis under alternative assumptions converges to 1 as n → ∞
(i.e., both the asymptotic and the bootstrap tests are consistent).
The application of the bootstrapped one-sample test is now illustrated.

Example 2. In Example 2 involving RFS X about the motivation of the course, we


assume that the considered students determine a sample of size n = 29, providing
x1 , . . . , x29 ) of the simple random sample in Table 1
us with the realization (

Table 1 Fuzzy answers to the motivation of a given course of 29 students attending the II
Summer School of the ECSC

xi )0
inf( 50 34 21 70 50 75 70 52 50 60 80 10 65 20 60 44 60 50 60 90 56 30 10 60 70 80 55 70 69
xi )1
inf( 60 40 23 80 60 80 74 60 55 70 90 30 70 30 70 47 70 60 67 100 60 40 20 65 76 90 65 80 100
xi )1
sup( 70 41 34 90 70 90 86 60 60 80 90 40 70 30 70 53 80 70 72 100 64 40 20 75 84 90 74 100 100
xi )0
sup( 80 46 40 100 80 100 90 64 70 90 100 60 75 40 80 71 90 80 80 100 70 50 30 80 90 100 80 100 100

and one wishes to test the null hypothesis



H0 : E(X ) = Tra(50, 60, 70, 80),

which is displayed in Figure 3


Then, the bootstrap algorithm (with θ = 1/3, ϕ = Lebesgue measure on [0, 1],
and B = 10000) provides with a p-value equal to .648, which means that H0 should
not be rejected at most of the significance levels which are usually considered.
Fuzzy vs. Likert Scale in Statistics 419

Q1. Motivation of the course

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

0% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Fig. 3 Hypothetical population mean fuzzy answer

In a similar way, two-sample and multi-sample tests for both independent and
linked samples have been developed, the bootstrap approach being in general
the most appropriate one. An R-package called SAFD (Statistical Analysis of
Fuzzy Data) have been recently designed by Lubiano and Trutschnig to perform
computations with RFSs. This package includes most of the procedures com-
mented in Subsection 4.3 and is being periodically updated. It can be found in
http://bellman.ciencias.uniovi.es/SMIRE/SAFDpackage.html.

5 Concluding Remarks
The use of free response fuzzy-numbered formats instead of Likert’s ones (or
alternate real- or fuzzy-valued codings) to answer to questions related to valua-
tions/opinions/ratings involving some subjectiveness has been discussed in this pa-
per. A relevant advantage for this approach is that the suggested format captures
much better accuracy, subjectiveness and variability of answers, whence their sta-
tistical analysis becomes more interesting. Actually, this analysis can be carried out
through recent inferential developments which have been shortly commented.
An open direction that could be thought about is the one combining the free
response and summary answers with the ideas by Eshragh and Mamdani [3] in case
a linguistic interpretation is needed, although originally the suggested format would
not force users to take this combination into account.

Acknowledgements. This research has been partially supported by/benefited from the Span-
ish Ministry of Science and Innovation Grants MTM2009-09440-C02-01 and MTM2009-
09440-C02-02, the Principality of Asturias Grants IB09-042C1 and IB09-042C2, and the
COST Action IC0702. Their financial support is gratefully acknowledged.

References
1. Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers.
Math. & Soft Comput. 2, 71–84 (1995)
2. Bharadwaj, B.: Development of a fuzzy Likert scale for the WHO ICF to include categor-
ical definitions on the basis of a continuum. ETD Collection for Wayne State University.
Paper AAI1442894 (2007),
http://digitalcommons.wayne.edu/dissertations/AAI1442894
3. Eshragh, F., Mamdani, E.H.: A general approach to linguistic approximation. Int. J. Man-
Machine Studies 11, 501–519 (1979)
420 M.Á. Gil and G. González-Rodrı́guez

4. Gil, M.A., Lubiano, M.A., Montenegro, M., López-Garcı́a, M.T.: Least squares fitting of
an affine function and strength of association for interval data. Metrika 56, 97–111 (2002)
5. Giné, E., Zinn, J.: Bootstrapping general empirical measures. Ann. Probab. 18, 851–869
(1990)
6. González-Rodrı́guez, G., Colubi, A., Gil, M.A.: Fuzzy data treated as functional
data. A one-way ANOVA test approach. Comp. Statist Data Anal. (2011) (in press)
doi:10.1016/j.csda.2010.06.013
7. González-Rodrı́guez, G., Montenegro, M., Colubi, A., Gil, M.A.: Bootstrap techniques
and fuzzy random variables: Synergy in hypothesis testing with fuzzy data. Fuzzy Sets
and Systems 157, 2608–2613 (2006)
8. van Laerhoven, H., van der Zaag-Loonen, H.J., Derkx, B.H.F.: A comparison of Likert
scale and visual analogue scales as response options in childrens questionnaires. Acta
Pædiatr 93, 830–835 (2004)
9. Lalla, M., Facchinetti, G., Mastroleo, G.: Ordinal scales and fuzzy set systems to measure
agreement: an application to the evaluation of teaching activity. Quality & Quantity 38,
577–601 (2004)
10. Lazim, M.A., Osman, M.T.A.: Measuring teachers’ beliefs about Mathematics: a fuzzy
set approach. Int. J. Soc. Sci. 4(1), 39–43 (2009)
11. Lubiano, M.A., Gil, M.A.: Estimating the expected value of fuzzy random variables in
random samplings from finite populations. Statistical Papers 40(3), 277–295 (1999)
12. Lubiano, M.A., Gil, M.A., López-Dı́az, M., López, M.T.: The lambda-mean squared
dispersion associated with a fuzzy random variable. Fuzzy Sets and Systems 111(3),
307–317 (2000)
13. Körner, R.: An asymptotic α -test for the expectation of random fuzzy variables. J. Stat.
Plann Infer. 83, 331–346 (2000)
14. Körner, R., Näther, W.: On the variance of random fuzzy variables. In: Bertoluzza, C.,
Gil, M.A., Ralescu, D.A. (eds.) Statistical Modeling, Analysis and Management of Fuzzy
Data, pp. 22–39. Physica-Verlag, Heidelberg (2002)
15. Montenegro, M., Casals, M.R., Lubiano, M.A., Gil, M.A.: Two-sample hypothesis tests
of means of a fuzzy random variable. Information Sciences 133(1-2), 89–100 (2001)
16. Montenegro, M., Colubi, A., Casals, M.R., Gil, M.A.: Asymptotic and Bootstrap tech-
niques for testing the expected value of a fuzzy random variable. Metrika 59, 31–49 (2004)
17. Nguyen, H.T.: A note on the extension principle for fuzzy sets. J. Math. Anal. Appl. 64,
369–380 (1978)
18. Puri, M.L., Ralescu, D.A.: The concept of normality for fuzzy random variables. Ann.
Probab. 11, 1373–1379 (1985)
19. Puri, M.L., Ralescu, D.A.: Fuzzy random variables. J. Math. Anal. Appl. 114, 409–422
(1986)
20. Ramos-Guajardo, A.B., Colubi, A., González-Rodrı́guez, G., Gil, M.A.: One sample
tests for a generalized Fréchet variance of a fuzzy random variable. Metrika 71(2),
185–202 (2010)
21. Trutschnig, W., González-Rodrı́guez, G., Colubi, A., Gil, M.: A new family of metrics
for compact. Sets Based on a Generalized Concept of Mid and Spread Inform. Sci. 179,
3964–3972 (2009)
22. Wu, C.-H.: An empirical study on the transformations of Likert-scale data to numerical
scores. Appl. Math. Sci. 58(1), 2851–2862 (2007)
23. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate rea-
soning. Part 1. Inform. Sci. 8, 199–249 (1975); ; Part 2. Inform. Sci. 8, 301–353; Part 3.
Inform. Sci. 9, 43–80

You might also like