You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/281727829

Test measurements of humor

Article · January 2014

CITATIONS READS

4 5,150

2 authors, including:

Willibald Ruch
University of Zurich
419 PUBLICATIONS   11,198 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

The Psychology of Humour View project

Interindividual differences in irony detection and use: The role of personality and ability View project

All content following this page was uploaded by Willibald Ruch on 23 March 2016.

The user has requested enhancement of the downloaded file.


This  manuscript  was  published  as:  
 
Ruch,  W.  &  Heintz,  S.  (2014).  Test  measurements  of  humor.  In  S.  
Attardo  (Ed.),  Encyclopedia  of  humor  studies  (pp.  759-­‐761).  Thousand  
Oaks,  CA:  Sage.  
 
TEST MEASUREMENTS OF HUMOR

Humor researchers use instruments to assess individual differences in humor, to display a

person’s humor profile, to relate these differences to other phenomena (like personality or

health), and to document changes in humor due to interventions. The question of what instrument

to use arises. Is it one for all research questions? Is it the one with the highest number of

subscales, or the most recent one, the one with the most sophisticated name? Several formal- and

content-related factors determine the choice of which instrument to use for what purposes. This

entry first discusses the criteria used to determine if a test is psychometrically sound. Second,

currently used humor instruments (joke / cartoon tests and questionnaires) are presented..

Formal Criteria

Formal criteria refer to the construction and documentation of a test and to the psychometric

properties. A well-documented test will contain information on the nature of the concepts to be

measured (e.g., how sparse vs. elaborated the variable definition is and if it is based on a theory),

the type of construction procedure employed (e.g., factor analytic, empirical, rational), how

elaborate the construction stage was (e.g., how were the items generated, how many samples

were used, was there an item analysis), and the psychometric properties.

Psychometric properties - mainly objectivity, reliability, and validity - allow an

evaluation of the quality of the test measurement. According to Gustav A. Lienert and Ulrich

Raatz, objectivity is standardization in procedure, scoring, and interpretation. Reliability is the

degree to which the measurement is free from error variance, i.e., how accurate the test
measures. Validity of a test is the extent to which it measures or predicts some criterion of

interest. Sufficient objectivity is the necessary precondition for high reliability; and high

reliability, in turn, is required to achieve satisfying validity.

Reliability varies between zero (unreliability) and one (perfect reliability), and should

exceed .60 for group assessment and .80 for individual assessment. Reliability can be estimated

through internal consistency (ie., the intercorrelations of all scale items; a frequently reported

measure for continuous measures is Cronbach’s Alpha), the parallel-test method (correlation of

two parallel tests), and the retest method (stability; correlation of the same test across two points

in time).

Validity entails several aspects, of which construct and content validity will be discussed

in more detail. Frederic Lord and Melvin R. Novick distinguish between empirical and

theoretical validity, i.e., relations of a measurement with observable variables versus latent

variables (e.g., hypothetical, non-observable constructs). Construct validity belongs to theoretical

validity and indicates the efficiency of a test in terms of measuring what it was designed to

measure. It comprises convergent (high correlations with scales that measure the same or a

similar construct) and discriminant validity (low correlations with scales that measure a

dissimilar construct). Construct validity can be tested in a multitrait-multimethod (MTMM)

analysis, which traditionally involves comparing correlation matrices of at least two measures

and traits. Modern statistical approaches to conduct MTMM analyses include structural equation

modeling (SEM) and multilevel-modeling.

Content validity has been described as the amount to which a test represents the criterion

or construct to be measured. Content validity can be ensured by (a) defining the scope of the

criterion or construct of interest and (b) obtaining expert ratings of the representativeness of the
test items according to the definition. In humor research, tests employing jokes and cartoons to

assess humor appreciation or production are inherently content valid, as they obtain direct ratings

of the criterion at hand (like the funniness of a joke or writing a humorous cartoon caption).

The Measurements

The different kinds of humor assessment tools can be grouped into seven categories: 1. Informal

surveys, joke telling techniques, or diary methods; 2. Joke and cartoon tests; 3. Questionnaires,

self-report scales; 4. Peer-reports; 5. State measures; 6. Children humor tests; 7. Humor scales in

general instruments; and 8. Miscellaneous and unclassified. More than 60 instruments have been

developed that fall into one of these categories. Meanwhile more than two dozen new measures

were constructed. A survey of the various instruments allowed some conclusions about them,

most of which are still valid today. One was that over the entire span covered, the instruments

often purported to measure “sense of humor” even when the methods used or the contents

diverged largely (questionnaires, jokes/cartoons tests) and zero correlations can be expected

between the instruments.

Until the 1980s joke and cartoon tests were most frequent, and more recently

questionnaires have been more frequent. Little effort has been invested in peer-evaluation

techniques or experimental assessments. Also, most instruments are for adults and few are

applicable to children. Many instruments are trait-oriented and thus not well suited for measuring

change (e.g., as needed in intervention studies). Another observation was that the same labels do

not necessarily imply the same concepts (as in nonsense, which may stand for harmless jokes or

ones that do not resolve incongruity), and scales with different labels might still measure the

same construct. Also, there has been little interest in multiple operationalizations of the same
construct to determine convergent validity. This would allow determining how much method

variance and how much content variance are in the measures. Another observation is that very

often an instrument was designed for one study only, and only a couple of tests were published

with a company (e.g., the IPAT Humor Test of Personality). While work was devoted to

constructing scales, comparatively little effort was spent on working on the concepts.

Humor Assessment Tests Currently in Use

The selection of current instruments for the assessment of humor traits and states in children and

adults that follows is not comprehensive, but contains measurements of humor with adequate

psychometric properties.

Joke and cartoon tests

♦ The 3 WD Test of Humor Appreciation obtains ratings of funniness and aversiveness

(each on unipolar seven-point scales) of incongruity-resolution, nonsense, and sexual humor,

respectively. In addition, scores for the total funniness and total aversiveness of humor,

structure preference (incongruity-resolution vs. nonsense), and appreciation of sexual content

(with removed variance of structure) can be assessed. There are different versions of the 3

WD, including a short version (30 items, plus five “warming up” items). Many studies across

several nations showed its construct validity.

♦ The Escala de Apreciación del Humor (EAHU; Humor Appreciation Scale) measures six

dimensions of humor appreciation, of which two are structure-related (incongruity-resolution

and nonsense), and four are content-related (sexual, black, woman disparagement, and man

disparagement). The test comprises 32 items, which are rated on funniness and aversiveness
on unipolar five-point scales.

♦ The Cartoon Punch Line Production Test (CPPT) is a measure of quantity or fluency

(i.e., number of produced punch lines) and quality or origence (i.e., peer-rated funniness and

originality) of humor production. As many funny captions as possible are written in response

to 15 caption-removed cartoons of incongruity-resolution, nonsense, and sexual humor in a

30-minute time limit (the short form, or CPPT-k, features six cartoons and a 15-minute time

limit). Besides quality and quantity of humor production, wit and imagination of the

participant are assessed as well. The instrument was tested for construct validity.

Questionnaires

♦ The Humorous Behavior Q-Sort Deck (HBQD) assesses 10 humor styles located on five

bipolar dimensions, namely socially warm vs. cold, reflective vs. boorish, competent vs. inept,

earthy vs. repressed, and benign vs. mean-spirited. It consists of 100 humorous behaviors,

which are ranked on bipolar seven-point scales according to their typicality using a Q-sort

technique. In contrast to Likert-type scales, this technique forces a normal distribution of

answers and thus results in ipsative scoring (i.e., an intraindividual ranking of the items).

♦ The Humor Styles Questionnaire (HSQ) measures four everyday functions of humor

(affiliative, self-enhancing, aggressive, and self-defeating). The HSQ has 32 items with a

unipolar seven-point answer format. There is initial evidence for its construct validity and a

large body of studies showing predictive validity. Recently, a children’s version of the HSQ

has been developed.

♦ The Sense of Humor Scale (SHS) measures playful vs. serious attitude, positive vs.

negative mood, and six facets of sense of humor (enjoyment of humor, laughter, verbal

humor, finding humor in everyday life, laughing at yourself, and humor under stress), which
can be combined into one “humor quotient”. The instrument has 40 items with a bipolar

seven-point answer format.

♦ The State-Trait-Cheerfulness-Inventory (STCI) measures cheerfulness, seriousness, and

bad mood as traits and states using a four-point answer format. The trait version (STCI-T)

assesses the temperamental basis of humor and comes in different versions (a short, standard,

and long form with 30, 60, and 106 items, respectively). The STCI-T has a well-studied

content and construct validity. The state version (STCI-S) has 30 items and instructions for

different time spans (now, last week, last month, in general) and is suited for pre/post

comparisons. In addition, a peer-report version and a version for children and adolescents of

the STCI-T exist.

♦ The Values in Action Inventory of Strengths (VIA-IS) measures 24 character strengths

with a total of 240 items in a bipolar five-point answer format. Humor (playfulness) forms

one of these strengths (assessed with 10 items), and is related to the virtues of temperance,

humanity, and wisdom.

Willibald Ruch

Sonja Heintz

See also 3 WD Humor Test; Appreciation of Humor; Cheerfulness, Seriousness and Humor;

Children's Humor Research; Factor Analysis of Humor Scales;

Further Readings

Campbell, D. T. & Fiske, D. W. (1959). Convergent and discriminant validation by the

multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.


Carretero-Dios, H., Pérez, C., & Buela-Casal, B. (2010). Assessing the appreciation of

the content and structure of humor: Construction of a new scale. Humor: International Journal

of Humor Research, 23, 307–325. doi:10.1515/humr.2010.014

Casu, G., & Gremigni, P. (2012). Humor measurement. In Gremigni, P. (Ed). Humor and

health promotion (pp. 253-274). Hauppauge, NY: Nova Science Publishers.

Cattell, R. B., & Tollefson, D. L. (1966). The IPAT Humor Test of Personality.

Champaign, IL: Institute for Personality and Ability Testing.

Craik, K. H., Lampert, M. D., & Nelson, A. J. (1996). Sense of humor and styles of

everyday humorous conduct. Humor: International Journal of Humor Research, 9, 273–302.

doi:10.1515/humr.1996.9.3-4.273

Köhler, G., & Ruch, W. (1995). On the assessment of 'wit': The Cartoon Punch line

Production Test. European Journal of Psychological Assessment, 11, Supplement 1: 7-8.

Köhler, G., & Ruch, W. (1996). Sources of variance in current sense of humor

inventories: How much substance, how much method variance? Humor: International Journal of

Humor Research, 9, 363-397. doi:10.1515/humr.1996.9.3-4.363

Lienert, G. A., & Raatz, U. (1998). Testaufbau und Testanalyse [Test construction and

test analysis]. Weinheim, Germany: Psychologie VerlagsUnion.

Lord, F. M., & Novick, M. R. (2008). Statistical theories of mental test scores. Reading,

MA: Addison-Wesley.

Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003). Individual

differences in uses of humor and their relation to psychological well-being: Development of the

humor styles questionnaire. Journal of Research in Personality, 37, 48-75.


McGhee, P. E. (1999). Health, healing, and the amuse system: Humor as survival

training. Dubuque, IA: Kendall/Hunt.

Peterson, C., & Seligman, M. E. P. (2004). Character strengths and virtues: A handbook

and classification. Washington, DC: American Psychological Association.

Ruch, W. (1992). Assessment of appreciation of humor: Studies with the 3 WD humor

test. In C. D. Spielberger, & J. N. Butcher (Eds.), Advances in personality assessment (Vol. 9,

pp. 27-75). Hillsdale, NJ: Erlbaum.

Ruch, W. (1998). The sense of humor: Explorations of a personality characteristic.

Berlin, Germany: Mouton de Gruyter.

Ruch, W., & Köhler, G. (1999). The measurement of state and trait cheerfulness. In I.

Mervielde, I. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality Psychology in Europe (Vol.

7, pp. 67-83). Tilburg, The Netherlands: Tilburg University Press.

Sireci, S. G. (1998). Gathering and analyzing content validity data. Educational

Assessment, 5(4), 299-321. doi:10.1207/s15326977ea0504_2

View publication stats

You might also like