You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335594579

Gamifying psychological assessment: insights from gamifying the thematic


apperception test

Conference Paper · August 2019


DOI: 10.1145/3337722.3337737

CITATIONS READS
4 606

4 authors, including:

Christoffer Holmgård Sam Snodgrass


New York University Drexel University
40 PUBLICATIONS   789 CITATIONS    31 PUBLICATIONS   581 CITATIONS   

SEE PROFILE SEE PROFILE

Casper Harteveld
Northeastern University
133 PUBLICATIONS   1,518 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Other Work View project

Levee Patroller View project

All content following this page was uploaded by Casper Harteveld on 10 May 2021.

The user has requested enhancement of the downloaded file.


Gamifying Psychological Assessment
Insights from Gamifying the Thematic Apperception Test

Borna Fatehi Christoffer Holmgård


Northeastern University Northeastern University
fatehi.b@husky.neu.edu christoffer@holmgard.org

Sam Snodgrass Casper Harteveld


Northeastern University Northeastern University
sam.psnodgrass@gmail.com c.harteveld@northeastern.edu

ABSTRACT various situations, and collect rich behavioral data in an unobtru-


Gamification and serious games have the capacity to increase en- sive, controlled manner [30]. Despite the clear promise and existing
gagement in often non-engaging contexts such as a test. In this evidence for the benefits of games for assessment, their applica-
study, we gamified a psychological test, the Thematic Apperception tion in organizational psychology and clinical practice is still very
Test (TAT), with the player motivation concepts of achievement, sparse. More research is warranted to find how games and gamifi-
exploration, and social interactions. We used the platform Study- cation can be applied to the assessment and treatment of particular
Crafter to implement our games and ran a study to test the effec- phenomena and contexts, and to develop robust methodologies.
tiveness of this gamified psychological assessment. All participants In psychological research, using games as an assessment tool
completed both the standard version of the TAT and our gamified first appeared at the end of the 20t h century when the potential of
versions and rated their experience in each setting through self- games for performance assessment was introduced [37]. Since then,
reports. Our results show that the gamified versions of the TAT games grabbed the attention of cognitive psychologists trying to
provided a more enjoyable and motivating experience than the stan- understand cognitive processes, such as perception and skill devel-
dard version. We conclude that gamifying psychological tests has opment [4, 6, 20], or to perform cognitive assessments on elderly
potential for increasing motivation in psychological assessments people [11, 63, 64]. Additionally, researchers have started exploring
while questions of validity remain to be addressed. games for studying social aspects, such as teamwork and negotia-
tion [5, 17, 40], job recruitment [41], assessing intelligence [33], and
CCS CONCEPTS personality profiling [9, 67, 68]. In many of these instances games
• Human-centered computing → User studies; Empirical stud- are leveraged as a tool or an environment to gain insight into psy-
ies in HCI; Empirical studies in interaction design; chological phenomena. Less common is the gamification [14, 24]
(i.e., the addition of game elements to non-game tasks) of psycho-
KEYWORDS logical experiments and assessments. In this study we put the exact
same test in a game environment. One can perceive this as a form of
Gamification, serious games, psychological assessment, engage- hard gamification [1], where the task to be gamified is reconstructed
ment, Thematic Apperception Test (i.e., gamified) as a fully functional game, or as a serious game [26]
ACM Reference Format: with the purpose of increasing engagement with a test.
Borna Fatehi, Christoffer Holmgård, Sam Snodgrass, and Casper Harteveld. Reasons to gamify existing psychological assessments are man-
2019. Gamifying Psychological Assessment: Insights from Gamifying the ifold. Similar to research instruments in general, psychological
Thematic Apperception Test. In The Fourteenth International Conference on
assessments suffer from many participant response issues, such as
the Foundations of Digital Games (FDG ’19), August 26–30, 2019, San Luis
speeding, random responding, lack of attention, and response biases
Obispo, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.
1145/3337722.3337737 (e.g., social desirability bias) [23, 25]. At their core, all these issues
point to the need for motivated participants [21]. Other challenges
1 INTRODUCTION are reliability (i.e., does a participant attain the same result when
repeating the assessment?) and validity (i.e., does the instrument
Game-like environments are becoming increasingly popular as
measure what it is meant to?). We hypothesize that a gamified psy-
tools for assessment [29, 33, 38, 44, 59, 60, 64]. By gamifying assess-
chological test can increase participants’ engagement and thereby
ments we can supply an engaging context, immerse participants in
their motivation, which could ultimately translate into more re-
Permission to make digital or hard copies of all or part of this work for personal or liable and valid test results. In this study we focus especially on
classroom use is granted without fee provided that copies are not made or distributed the first part of this hypothesis by investigating how participants’
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the motivation with gamified psychological assessment differs from
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or their motivation with the standard version.
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
Furthermore, in this exploratory study of gamifying psychologi-
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA cal assessment, we chose to focus on a category of psychological
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. tests where motivation plays an important role: tests that require
ACM ISBN 978-1-4503-7217-6/19/08. . . $15.00
https://doi.org/10.1145/3337722.3337737
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA Fatehi et al.

detailed open responses [8].1 As a case study, we gamify one of


the most widely used and validated personality tests: the Thematic Motivation Reliability
Apperception Test (TAT) [48]. The TAT is a projective test where
participants are prompted by visual stimuli to generate open-ended Game
responses which are interpreted as reflections of the participant’s Engagement
personality by a trained psychologist. As participants need to write Response
detailed responses for this test, motivation to write is a key met- Validity
Volume/Quality
ric. The challenge of the TAT is that participants may not write
in-depth enough to provide sufficient data for interpretation. Using
StudyCrafter [27, 50], a new easy-to-use platform for developing Figure 1: Theoretical model of improving psychological as-
gamified research instruments, we built two different game ver- sessments. The light blue concepts and lines are not ad-
sions and a control version (i.e., the standard test). We recruited 18 dressed in the present work; the dark blue ones are.
student volunteers for a within-subjects experiment where each
participant was exposed to one of the game versions and the con-
trol version. Our aim with this study is to test whether the game
defined as “the reasons underlying behavior” [22] or as “the at-
versions increase engagement compared to the control version and
tribute that moves us to do or not do something” [7]. Motivations
explore the differences in outcomes between all three versions.
can be extrinsically stimulated (e.g., rewards, punishment, money)
The contributions of this paper are two-fold. Our work provides
or intrinsically, that is, motivation based in the inherent satisfac-
a systematic exploration of gamifying a traditional psychological
tions derived from action [53]. While games include extrinsic mo-
test, and specifically a test that requires detailed open responses,
tivational elements, such elements help to draw players in (“get
in order to increase participant motivation. While careful consid-
attention”) and, ultimately, when designed well, games can (and
erations and in-depth expertise in psychological assessments are
should) be intrinsically motivating (“hold attention”) [13, 54]. There-
needed, we highlight how traditional psychological assessments
fore, we see game engagement as a means to stimulate and hold
can be gamified with relatively limited effort—by harnessing easy-
(intrinsic) motivation.
to-use platforms such as StudyCrafter. Our hope is that these design
We assume that increased motivation results in more (“response
insights and tools can help research psychologists and psychome-
volume”) and better (“response quality”) responses. If participants
tricians gamify psychological assessment in the future. Second, we
are more motivated to complete a task, based on the general lit-
evaluate the effectiveness of our gamified versions of the TAT in
erature it can be expected that their effort and performance will
terms of their effect on participant motivation, and show that our
increase [e.g., 55]. This can then arguably increase the reliability
gamified versions increase player motivation.
and validity of the test outcomes. With more and better responses,
The remainder of this paper is organized as follows. In the back-
the data available for interpretation is richer and more comprehen-
ground section we elaborate on the rationale for this work in general
sive, which in the case of psychometric instruments should lead to
and the presented study in particular, provide a brief description of
a more accurate assessment.
the TAT, explain how we advance designing gamified assessments,
To summarize, the rationale for gamifying assessments is that
and introduce the StudyCrafter platform. In the methods section, we
game engagement can directly help to increase motivation. With
detail our within-subjects experimental study, which includes dis-
more motivated participants, better response volume/quality can in-
cussing how we gamified the TAT. The paper ends with presenting
directly result. Finally, both the increased motivation and response
and discussing the results before drawing our conclusions.
volume/quality can better guarantee the reliability and validity of
2 BACKGROUND a test. All these conjectures form the basis of our theoretical model
presented in Figure 1. In the work presented here, we will only
2.1 Rationale for Gamifying Assessments address some of these conjectures, specifically how engagement
Motivating participants, reliability, and validity are typical chal- can impact motivation and response volume/quality.
lenges for psychological tests [21]. Whether or not a test is taken
within a structured settings, such as for selection or clinical pur- 2.2 Thematic Apperception Test
poses, the ongoing motivation to participate in a test is critical For our study, we chose to gamify the Thematic Apperception Test
for the quality of the results. In structured settings participants (TAT) [48]. The TAT is a projective test that consists of a set of
may be more intrinsically motivated to provide input, but issues cards depicting human characters in ambiguous situations (e.g., a
such as question fatigue, loss of attention, speeding, and random child sitting behind a table while leaning his head on his hand and
responding still occur. Gamifying tests can help keep an experience looking at a violin on the table). The participant is asked to write a
engaging, sustain the attention, and pace the experience [24, 26]. story about (1) what led up to the events depicted on the card, (2)
For open response tests, the type of test covered here, ensuring what events are occurring on the card, (3) what events will occur
motivation is important, as it likely impacts the quality of responses in the future, and (4) the thinking and feelings of characters on
and hence the information that can be derived from them. the card [48]. By writing responses to these pictures, participants
Engagement and motivation are two concepts that in the liter- reveal aspects of their personality. Thus, more detailed stories can
ature are sometimes used interchangeably. Motivation is broadly lead to more accurate analyses. One of the main challenges of
1 In this paper we use the terms “test” and “assessment” interchangeably. interpreting the stories is the lack of quality input. Inadequate
Gamifying Psychological Assessment FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA

amount and quality of text might corrupt the interpretation of to find a better way to measure the test outcomes, which can be
one’s persona or problem and, therefore, make the therapy less achieved by leveraging the affordances of games [30].
effective. Since participants are notoriously less willing to respond Because of the affordances, and in accordance with the notion
to open responses [8], we chose to gamify the TAT to touch on this to go beyond points/badges/leaderboards, we opted to employ a
specific challenge that is unique to tests that require detailed open “hard gamification” approach, which is the complete redesign of a
responses. We hypothesize that the motivation resulting from their non-game activity into a fully functional game, as opposed to “soft
engagement with a game can address this issue. gamification” where game elements are only added to the activity [1,
Of all tests that require detailed open responses, we chose the 18]. Evidence shows that a player’s personality can describe the
TAT because it is one of the most widely used, taught, and re- reason behind a chosen action in a game [9, 67, 68], which suggests
searched projective techniques in the clinical practice [16, 42, 51, 52]. that games are indeed very suitable environments for personality
To illustrate, in a survey of 412 assessment-active clinical psychol- assessments such as the TAT. By naturally integrating the TAT with
ogists 90% were in agreement that clinical psychology students actual game actions, we conjecture that this will best harness the
should be competent with this test [70]. This survey also revealed affordances of games while keeping the TAT intact. Because we
it is one of the top 5 most widely used tests with 82% of the clinical ended up making actual games, our work can also be perceived
psychologists using it at least occasionally. However, the TAT, like as a form of serious gaming [26]. Regardless, our work should be
many other projective techniques, has been harshly criticized [e.g., perceived as gamifying an existing task: the same task is embedded
42]. The critique focuses on its poor psychometric properties: a in the form of a game or experienced in a game-like manner.
lack of standardization of evaluating results and, related, its low Additionally, as we seek to increase motivation, we were inspired
reliability and validity. Proponents have counter-argued that these in our design approach by considering player motivations. The
traditional standards should not be applied [e.g., 34]. The TAT does central idea behind theories of player motivation is that one-size-fits
not aim to measure narrow constructs or generalize findings to all experiences should not be used because of inherent differences
the population at large. Instead, it seeks to reveal a broad, complex between players: they have different genre preferences, play styles,
picture of a particular client’s personality across their responses. Ad- but especially motivations for playing. There has been some recent
ditionally, a meta-analysis suggests that there is no clear evidence advances in understanding individual player differences [62, 65,
for the idea that the TAT has a notable psychometric deficiency 66]; however, as Deterding and Nacke [49] discuss, we still know
compared to other instruments. It also argues that many of the limi- very little about the effect of player types, and the effectiveness of
tations pointed to projective tests (e.g., reliability and validity) apply designing with player types in mind.
to all psychological and medical tests [46], and some would argue For our work, we resorted back to the original Bartle player
that these limitations apply more so for personality inventories or types [3], with the exception of the “killer” type, from which sprung
rating scales [15]. much of this research into individual player differences. We chose
to focus on the remaining three player types of achievers, explor-
ers, and socializers because they have been largely confirmed in
2.3 Designing Gamified Assessments subsequent empirical research [71] and map well onto hexad scale
The idea of gamification or gamifying non-game contexts started user types [62] as well as Self-Determination Theory (SDT) [53], a
around 2008 [14]. Since then it has taken off in various domains, widely used and validated theory on intrinsic motivation, which
from health to business. Review articles of gamification in gen- has also been investigated in the context of games [13, 54]. In a
eral [24, 39] and in specific domains [36, 43, 45, 56, 69] all suggest nutshell, achievers (SDT: competency, Hexad: players and achievers
positive effects though some mixed results are noted. These reviews combined) seek to collect, win, and master skills; explorers (SDT: au-
note that most studies lack theoretical foundations and rigorous tonomy, Hexad: free spirits) like to roam around the virtual worlds
methodological frameworks. In their reflection on how gamification they are immersed in and try out activities offered to them; and
has matured, Deterding and Nacke [49] emphasize this lack too socializers (SDT: relatedness, hexad: socializers) enjoy interacting
while also suggesting a broadening from the points and badges to with other players or NPCs.
other features of game design. In our work we focus on gamifying Finally, for our design approach we sought to make use of a
psychological assessment, a relatively unexplored area. Closest to platform that would make it easier to design psychological assess-
this work concerns efforts to gamify surveys [e.g., 10, 28]. ments. Aside from the need for more rigorous empirical evidence,
As for assessment, in education this has been broadly recognized one possible reason this area is unexplored in practice is that it
as an important area [29, 59, 60]. In fact, Shaffer and Gee [57] have takes many resources and expertise to build game-like activities. A
argued that “We’ve been designing games for learning when we platform such as StudyCrafter can address this.
should have been designing games for testing.” The central idea
behind this is that games are essentially (educational) tests: in 2.4 StudyCrafter
every stage, games assess the content or skills needed and do not To develop the gamified TAT, we opted to make use of Study-
allow players to progress unless they master them. Therefore, the Crafter [27, 50], an online platform where users can create and
completion of a game demonstrates learning of content and skills participate in research projects. The platform consists of the fol-
within that game. For psychological assessment, however, it works lowing main components: an editor and a website. With the editor,
differently. There is no need to learn while playing; in fact, such users can create projects and these projects will then be hosted on-
a “learning effect” should be avoided. Instead, what is needed is line on the website for participants to play. A StudyCrafter project
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA Fatehi et al.

consists of one or more scenes, which in turn consist of a layout and a laboratory setting was considered more optimal at this stage of
a script. The layout provides the visual setting where the scenario our work. It allowed to directly observe and informally interview
plays out, the script describes what happens during the scenario. participants. More importantly, in order to minimize the learning
By connecting scenes to each other, players can move between effect across conditions, we included an interval of one week be-
different settings (e.g., walking from bedroom to living room). tween the conditions, and by working with participants on site
In the editor, users have the option to go to the “layout” tab for we would minimize attrition. Indeed, six participants (25%) already
working on the visual setting. This screen acts much like a theater did not return to complete the experiment. The inclusion of two
stage. Here users can choose backgrounds, objects, characters, and gamified versions was to explore possible differences as a result of
interface elements from a list and place them on a canvas as they design; however, at this stage of our work we did not intend to test
see fit. Users have the option for adding custom assets, which we this explicitly. Our study is experimental in its setup; however, we
used for integrating the TAT cards. For working out the script, want to emphasize here the exploratory nature of this first study.
users have to navigate to the “script” tab. This view allows users to Our aim is to retrieve important insights that would help to further
make a visual decision tree with nodes and links for how a scenario advance this unexplored area.
unfolds and to specify the dialog, player choice options, and actions
(e.g., move a character, play a sound) as well as how performance 3.2 Participants
is evaluated (through variables). Important for this work is that the We recruited 32 student volunteers who participated without any
player choice options can be set to “open response” meaning that form of remuneration. Eight participants did not show up and six
players can freely type in their response. Once a project is finished, participants did not complete the second phase of the experiment.
it is embedded in the StudyCrafter website. Project creators can then They were all removed from the analysis. The included 18 partici-
recruit people to participate in the project. All data is automatically pants are mixed gender (7 female, 11 male), mixed race (6 Asian,
logged and made available to users through the website. 12 White), and aged between 20 and 33 years old (M = 26.6, SD =
While limited by the tool’s affordances, with this editor users can 3.6). We obtained IRB approval for the purposes of the study and
make various scenarios in a 2D digital environment with limited informed consent from all participants.
technical expertise and artistic skills. In fact, the work presented
here was developed by a single person with some technical support 3.3 Materials
as the platform was still in alpha at the time of development. Thus 3.3.1 TAT Cards. To make our experiment manageable, we picked
far, StudyCrafter’s potential has been mainly evaluated through two sets of three cards from the 20 available TAT cards [58]. One set
replicating experiments based on well-known decision making (referred to as 3GF, 7BM, 18GF) was used for both game versions;
phenomena, such as the framing and decoy effect [31, 61], and the other (referred to as 3BM, 12F, 12M) for the control version. The
recently for gamifying traditional survey instruments [28]. cards in both sets share the same themes to keep the conditions as
similar as possible; however, because participants experience differ-
3 METHODS ent cards a fatigue effect is avoided. Two professors in psychology
3.1 Design who are familiar with the TAT reviewed our choices and confirmed
the similarity of their themes.
We set up a within-subjects experiment in a laboratory setting,
where each participant got exposed to a test and control condition. 3.3.2 Game Versions. The two different gamified versions of the
The test condition concerns one of two gamified TAT versions; the TAT and the digital replication of the TAT were implemented using
control condition a digital replication of the paper-pencil version StudyCrafter v2.0.10.2 The TAT itself was not manipulated; it was
of the test. The dependent variables are participants’ self-report of fully embedded in two games where the story progresses through
their motivation and response volume/quality. For response volume, clicking, choosing a response option from a list, and—of course—
we looked at the number of written words; for response quality, we filling out the open responses. We used two different gamified
used participants’ self-report (defined as writing honestly and in versions to explore a possible effect of the design.
detail) and the number of relevant themes we could extract (see TAT The main story of the Standard, Game–1 and Game–2 versions
data analysis). We considered two subject variables: (1) participants’ were implemented in 2, 16 and 20 scenes, respectively, with the
player profile (e.g., frequency of play and genre preferences) and player-character and several non-playable characters (NPCs). Game–
(2) the general experience and usability of the gamified versions. 2 has four additional scenes that players can visit if they choose a
The main hypotheses of the study are the following: certain path with their selection of options. In all versions, the first
scene is a tutorial that teaches the players how to play (e.g., how to
Compared with the standard TAT, participants are more motivated click to advance, how to choose a dialogue option). The persona
to provide responses with the gamified versions of the TAT (H1) and of the main character is conveyed to the player through speech
provide more voluminous (H2) and higher quality (H3) responses. bubbles where the characters expresses their thoughts and feelings,
as well as conversations with NPCs.
The choice for within-subjects is to rule out any bias from indi- Game–1. In this version, the game character is a talented but
vidual differences and to get a direct comparison from participants desperate freelance writer who is only happy when he writes (Fig. 2).
between the two conditions. In particular, by comparing the writing He gets a phone call from one of the companies that he works with
of the same person across the gamified and control condition, we 2 TheTAT versions and StudyCrafter editor are available here: http://hdl.handle.net/
can better contrast the results from both conditions. The choice for 2047/D20318713
Gamifying Psychological Assessment FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA

and is informed that his writing has received good feedback from
readers. He is told that they want to allocate a story in the next
edition of their magazine and nominate it for national magazine
awards. The story would be writing about three TAT pictures by
addressing the four questions from the TAT procedure. Players
are informed that the judging panel is looking for detailed stories,
which we added to encourage players to write detailed stories.
After this narrative context is provided, we see the player-character Figure 2: Game–1 screen shots: On the left an introduction
sitting behind his desk. One by one the three TAT pictures pop up. scene of the player-character where he conveys his thoughts
The player input is a single response box for each picture where through speech bubbles; on the right the office with a TAT
they should address all four questions in the form of a story. When picture, a sticky note with a reminder of what to write about,
players finish the writing for each card, the mood meter increases and the open response option.
and positive feedback, in the form of a thought bubble, is conveyed
to players. After players finish the stories, they give it to the front
desk and gets thanked for providing them. The game ends with a
phone call three months later from the company informing players
they have been invited to a national magazine award ceremony.
Game–2. In this version, the game character is a famous critic
and is invited to his friend’s gallery to provide a report about her
work (Fig. 3). The work in the gallery are the three TAT cards. In
this version, the player-character walks through the gallery with
the friend and pauses at each card to answer the questions. The
Figure 3: Game-2 screen shots: On the left is where the player
friend asks the four main questions in the form of a conversation
enters the gallery and has a number of fixed dialogue op-
while players are seeing the pictures. Thus, here each question is
tions to choose from; on the right where he walks through
addressed in a separate response box. Again, after each response
the gallery with his friend reporting on the TAT pictures.
the mood meter increases and after each card positive feedback
Notice the mood meter at the bottom.
is given to players, this time through the NPC. After doing the
writing tasks, players are given an option of seeing more TAT
images, doing a fun activity (going out), or going back home. Each
of these options would lead players to an additional/optional set of to write for the magazine) or take an inappropriate action (e.g., pro-
scenarios. Nevertheless, the game ends with a phone call from the crastinating to get to their meeting on time). The change in mood
NPC friend after a week informing that the number of viewers of meter is conveyed to players visually as well as with a sound effect:
her gallery increased substantially since she published the critic’s a buzz or a beep sound plays when it goes down or up, respectively.
writing and she thanks him for his reporting. We included this mood meter to provide incentives to write, give a
Standard. For this version that acts as the control condition (false) sense of immediate feedback on their open responses, and
there are no gamified elements (Fig. 4). There is only text against a appeal to achievers who may be more inclined to keep the main
neutral background that provides participants with instructions to character in a good mood.
write stories about the cards as the cards appear. Similar to game–1, In terms of the distinctions, apart from the different narrative
participants provide one response per card. We implemented this contexts of the writer vs. critic, there are four important distinctions
standard version in the StudyCrafter environment to keep the input between the two gamified versions. First, in Game–1 participants
method consistent with the game versions. sit behind a computer looking at the pictures, while in Game–2 they
In terms of the design of the gamified versions, we ensured that walk around an art gallery. The latter may be more appealing to
they are fairly similar, yet substantially distinct in important ways. explorers. Second, in Game–1 participants go through the different
In terms of their similarity, both are narrative-based and their main questions as they are prompted by the computer, while in Game–2
characters share similar characteristics. In the first scenes, we tried they talk with an NPC character. The latter may be more appealing
to characterize the writer and the critic, respectively, by showing for socializers. However, as a consequence Game–2 is lengthier.
who they are in daily life and emphasized that they care about Third, both games have a mood meter, but it increases with 10 per-
writing. We did this to make participants familiar with their player- centage points after completing the response in Game–1, whereas
character, immerse them into the scenario, and provide a narrative it increases with 5 percentage points in Game–2 (so 20 points per
context that would motivate them for the task of writing stories card). This increased reward may be more appealing to achievers.
about the cards. Most importantly, in Game–1 all the questions are answered in one
Additionally, both gamified versions include a mood meter, which response box while in Game–2 questions are answered in separate
indicates the happiness of the main character. The mood meter goes response boxes. With this distinction we are exploring if by sepa-
up when players finish writing their response or receive a compli- rating out the requested questions and integrating them into the
ment about their writing (e.g., “You are a great critic and that’s why game it may increase the motivation to write more. However, a
I am asking you to do this”). It goes down when they choose an possible consequence of this input type is that there may be less
inappropriate choice of dialogue in a conversation (e.g., rejecting structured stories in Game–2 compared to Game–1.
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA Fatehi et al.

Comparison questionnaire. With this questionnaire, partici-


pants are first asked to what extent they think the game version
captures the essence of the test in the paper version. They are then
asked to compare in which version they felt most comfortable to
write (1) in more detail and (2) honestly, and what version they (3)
enjoyed the most and (4) found too lengthy. The available options
for these questions are: standard-version, game-version, equal for
Figure 4: Control screen shots: On the left the introduction both. The questionnaire ends by asking whether they agree with the
scene with instructions; on the right the TAT picture against statement “I preferred the game version over the standard version”
a neutral background, a sticky note with a reminder of what and the option to provide any comments.
to write about, and the open response option. Demographic questionnaire. This questionnaire, asks (op-
tional) general demographic information such as age, gender, ed-
ucation, and race/ethnicity. It also requests the amount of time
participants play video games per week.
3.3.3 Questionnaires. There are four questionnaires in total: post-
control questionnaire, post-game questionnaire, comparison ques- 3.4 Procedure
tionnaire, and the demographics questionnaire. Both the post-paper The study is split into two phases with a one-week gap in between
and post-game questionnaire use 5-point Likert items; the others designed to reduce the memorability effect and uses counterbalanc-
have question specific response options. ing to avoid a possible order bias. In the first phase, participants
Post-standard questionnaire. Participants receive this after played either the standard version or one of the game versions.
completing the standard version. It consists of five closed and two After completing the test, they answered the related questionnaire
open questions. For the closed questions, they are asked to rate their based on what they have done. The 18 participants were almost
agreement on (1) their enjoyment, (2) the length of the experience, equally spread between the three conditions in the first phase (Stan-
(3) if they felt inclined to write in detail, (4) their comfort to write dard: n = 5; Game–1: n = 5; Game–2: n = 8). In the second phase,
honestly, and (5) if they feel they write the stories easier when asked they completed the other condition: if participants played one of
to do this on paper. The first two items (enjoyment, length) measure the game versions first, they played the standard version and vice
motivation: when something is intrinsically motivating, one enjoys versa (Standard: n = 13; Game–1: n = 3; Game–2: n = 2). After com-
the experience and forgets about time [12, 53]. The following two pletion, they answered the related questionnaire, the comparison
items (detail, honesty) measure the response quality. We considered questionnaire, and the demographic questionnaire (in that order).
it to be important to see if the participants themselves thought they At the start of the second phase, participants were informed to not
were providing detailed and accurate (and not random) responses, worry about providing similar or different responses. Participants
respectively. The fifth and last item was added to consider if the were able to complete each phase at their own pace. They were
implementation of the traditional TAT with StudyCrafter could have informed, however, that it would take approximately 30 minutes.
any effect on their responses. The open questions ask if participants
experienced any problems and have any additional comments. 3.5 TAT Data Analysis
Post-game questionnaire. This questionnaire contains the ques-
The TAT stories are meant to reveal some aspect of participants’
tions from the post-paper questionnaire, except for the validation
personalities. In other words, participants present some aspect of
question (5), as well as five closed questions on their game experi-
themselves in their stories. As people write more detailed, enriched
ence. Specifically, participants are asked to rate if they found the
stories describing the feelings they get from the pictures, they
game to be (1) motivating, (2) aesthetically pleasing, (3) believable,
give more clues about their personality. Although the reports for
(4) if they could relate to the player-character, and (5) to what extent
the questionnaire are for all the included 18 participants, due to
they felt immersed.
a technical problem while recording the data, we lost the story
Additionally, it has six closed questions on usability aspects of
data for nine participants and, therefore, the TAT data analysis is
the game. For this participants have to rate (1) the game’s logical
limited to this group. As the TAT’s use is more to benefit a specific
flow, (2) the funniness of the game story, (3) the ease of using
clinician in understanding a specific client, we opted to have one
the game interface, (4) the ease of learning how to play the game,
researcher familiar with the TAT conduct the analysis while the
and (5) the ease of following the story within the game. Because
other researchers performed a face validity check of the outcomes.
StudyCrafter was still under development, we asked the players if
they found some of the (corrupted) (6) games’ graphics distracting. 3.5.1 Analyzing the themes. The first step in the procedure was
Finally, as it is unclear how players may behave in the context to look for the themes associated with the stories. For our coding,
of the gamified versions, we asked participant to rate their attitude we used the Personality Research Form (PRF) scales developed by
while playing as measured by what extent they (1) tried to keep the Jackson [35]. The PRF has 22 main scales associated with personality
mood meter as high as possible, (2) tried to describe the pictures traits. The scales are relevant to the functioning of individuals in a
in detail, (3) were careful to address everything that was asked wide variety of situations and are used to understand the areas of
about the pictures, (4) tried to make no errors, (5) wanted to enjoy normal functioning rather than psychopathology. We used the PRF
the game, (6) wanted to try the options they had, and (7) took the because (1) it is based on the Variables of Personality set defined by
responsibility to accomplish the tasks well. Murray [47], the same person who made the TAT; and (2) unlike the
Gamifying Psychological Assessment FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA

Variables of Personality, it has been continuously refined based on Table 1: Repeated items in questionnaires, in Mdn [IQR].
research evidence [35]. Before the coding was initiated, participant
information was removed from the stories to avoid a possible bias. Question: Game–1 Game–2 Both Games Standard
We then read all the stories from the two phases and coded them Lengthiness 1.5[1.0-2.0] 2.0[2.0-2.3] 2.0[1.8-2.0] 2.0[1.0-2.0]
with the PRF themes. After, we compared the themes that were Detail 2.0[1.0-3.8] 2.0[1.0-4.0] 2.0[1.0-4.0] 2.0[2.0-3.0]
found in the settings for each participant. Enjoyment 2.5[2.0-3.0] 2.5[1.8-3.0] 2.5[2.0-3.0] 3.0[3.0-3.0]
3.5.2 Analyzing the details. The amount of detail provided might Honesty 1.5[1.0-2.0] 1.0[1.0-2.0] 1.0[1.0-2.0] 1.5[1.0-2.0]
lead to revealing more information about a person. A number be-
tween 0 and 3 is assigned to each story independently. Three is Note: Scales are from 1 to 5 with 1 being the highest number. Lengthiness
assigned to writings that address all four TAT criteria and include of 1 means that they found the experiment length extremely reasonable;
a rich story with described feelings and thoughts of the characters detail of 1 means the experiment encouraged them a great deal to describe
depicted and the situation they are in. In the stories with a score of the images in detail, etc.
three, the sentences are connected and form a coherent story. Two
is assigned to writings that are less descriptive but have satisfied This writing does not have the form of a story at all. It addresses
all or most of the four TAT criteria. They also have some form of the criteria in a very concise manner, but does not answer the first
a story, but here not all the sentences are necessarily connected. question: what is happening.
One is assigned to writings that have only addressed all or most of
the four criteria, revealing some information about how the writer 4 RESULTS
has interpreted the picture. Finally, zero is assigned to writings that
do not provide a rich story and clearly have not addressed all four 4.1 Descriptive Statistics
criteria. Examples of each score for the same picture are provided 4.1.1 Post-Standard Questionnaire. Table 1 shows that despite that
below. An example of writing with a score of 3: most participants found the length of the experiment reasonable or
He had asked her to come to the room and thats extremely reasonable (Mdn = 2.0, IQR = 1.0-2.0) and were encour-
[sic] when he told her, her mother has passed aged to describe the pictures in detail (Mdn = 2.0, IQR = 2.0-3.0),
away. Devastated by the news and crying she they enjoyed the experiment only a moderate amount (Mdn = 3.0,
opened the door to let some light in, so maybe IQR = 3.0-3.0) . Nevertheless, almost everyone were extremely or
she can think clear as if the sorrow and dark somewhat comfortable to write honestly (Mdn = 1.5, IQR = 1.0-2.0).
have blocked her mind. She opens the door, Seven participants reported they could have a better result, above
looks at his face through all the tears and asks average or far above average, if they could have written their stories
for more details. on paper or using a Word document.
This writing, although short, has a coherent story and a reader 4.1.2 Post-Game Questionnaire. From Table 1 we see that regard-
can understand the writer’s intention. It describes the feeling of less which game setting participants played, they enjoyed the game
the character very well and addresses the other three criteria: what experience more than a moderate amount (Mdn = 2.5, IQR = 2.0-3.0),
happened before, what happens after, what has led up to the event. found the length of the experiment reasonable (Mdn = 2.0, IQR =
An example of writing with a score of 2: 1.8-2.0) and were comfortable to write honestly (Mdn = 1.0, IQR =
A woman is holding her head in her hand. She 1.0-2.0). They also agreed that the game encouraged them to explain
is experiencing some negative emotion. Her the stories in detail (Mdn = 2.0, IQR = 1.0-4.0).
other hand is holding open a door. The work Table 2 shows the game experience and usability results. Here we
is in black and white. She is feeling sad and find that participants found both games only moderately motivating,
disappointed. Her boyfriend was cheating on aesthetically pleasing, and immersive, and experienced the graphics
her. She gets over her trauma and moves on. as somewhat distracting and the characters as slightly relatable (all
This writing addresses all four criteria. However, while some- Mdn = 3). However, participants found the story believable and fun
what story-like, the writing feels disconnected. An example of and believed the game had a logical flow and was easy to learn (all
writing with a score of 1: Mdn = 2). Likewise, they agreed they tried their best to address
everything that was asked, make no errors, make the mood meter
Somebody knocked the door, she was sleeping,
high, and describe the images in detail (all Mdn = 2). Finally, they
just woke up, trying to open the door for the
strongly agreed that the game interface was easy to use, the story
person. Sleepy, impatient. Somebody knocked
was easy to follow, and to try the options they had (all Mdn = 1).
the door. She is going to open the door for the
Both Table 1 and Table 2 show that the two games versions are
person behind the door.
comparable in various ways, and statistically, we did not find any
In this case, there is only an explanation of the visible event on differences. This gave us confidence that we were able to combine
the card and the actions taken by the character. Although it is not a the two games for some of our analyses. However, from Table 2
story, the writer has more or less addressed the four areas. Example there are noticeable difference that can be explained for given their
of a story with a score of 0: design. For example, we see that Game–2 has slightly more relatable
She looks upset. Frustrated. Lack of sleep. She’s characters, is more immersive, and incentivizes players to make the
going to have a nap. mood meter high and make less errors. Because this version gave
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA Fatehi et al.

Table 2: Game experience and usability items, in Mdn [IQR]. 4.2 TAT
The summary of results from the TAT analysis is shown in Table 4.
Comparison in terms of: Game–1 Game–2 Both Games
About 50% or more of the themes from the standard version stories
Being motivating 2.0[2.0-3.8] 3.0[2.0-4.3] 3.0[2.0-4.0] were also found in the game versions for six out of nine partici-
Pleasing aesthetics 2.5[2.0-3.8] 3.0[2.0-4.3] 3.0[2.0-4.0] pants; half of them had a consistency of 75% or higher. If we look
Believability 2.0[2.0-2.8] 2.0[2.0-4.0] 2.0[2.0-3.3] at the overall similarity between the standard and game versions
Relatable game characters 3.5[2.0-4.0] 3.0[2.0-5.0] 3.0[2.0-4.3] (calculated by dividing the number of similar codes by all the possi-
Immersion 3.5[2.0-4.0] 2.5[2.0-4.0] 3.0[2.0-4.0] ble codes found between the versions), it varied between 33–50%
Having logical flow 2.0[1.0-2.0] 2.0[1.0-2.3] 2.0[1.0-2.0] for all cases. The number of unique themes was not numerically
Ease of game interface 1.5[1.0-2.0] 1.0[1.0-2.0] 1.0[1.0-2.0] different in any of the settings in general. However, the total num-
Ease of learning to play 1.0[1.0-2.0] 2.0[1.0-2.3] 2.0[1.0-2.0] ber of themes was higher for the game versions in five cases (56%),
Ease of following story 1.0[1.0-2.0] 2.0[1.0-2.0] 1.5[1.0-2.0] the same in one case (11%) and lower for the rest (33%). The same
Funniness of story 2.0[2.0-3.0] 2.5[2.0-4.0] 2.0[2.0-3.3] pattern was found for the total score in details but this time the
Distracting graphics 3.0[2.0-4.0] 3.0[1.8-4.3] 3.0[2.0-4.0] standard version outperforms the game versions: the stories in the
I try my best to: standard version were more detailed in five cases (56%), remained
Make mood meter high 3.0[2.0-4.0] 2.0[1.0-3.0] 2.0[1.8-3.0] the same for one (11%), and less detailed for three (33%).
Describe images in detail 1.5[1.0-3.8] 2.0[1.0-2.0] 2.0[1.0-2.3] Table 4 further shows that there are indeed likely individual
Address everything 2.0[1.3-2.0] 2.0[1.0-2.0] 2.0[1.0-2.0] differences between how much people write and the detail they
Make no errors 2.0[2.0-2.0] 1.0[1.0-2.0] 2.0[1.0-2.0] provide, with total word counts ranging from 136–396 and detail
Try options I had 1.0[1.0-2.8] 1.5[1.0-3.3] 1.0[1.0-3.0] scores 2–8 for the standard version, and 41–650 and 0–8, respec-
tively, for the game versions. These ranges also suggest that the
games have a differing effect on individuals. Interesting case in
Note: Scales are from 1 to 5 with 1 being the highest number.
point are P13 and P22 who both played Game–1. P13 more than
doubled the number of words in the gamified version, whereas P22
more than halved theirs (but with slight increase in detail). P29 and
Table 3: Preference comparison of versions, in % (n).
P32 are also interesting cases as they consistently scored lowest on
word count and detail across both versions, but performed better
Question: Game Control Equal
on the standard version. This may suggest they were either not
Lengthiness 39% (7) 22% (4) 39% (7) motivated in this task or just generally had difficulty with it. Inspec-
Detail 44% (8) 44% (8) 11% (2) tion of their survey scores show that especially P32 may not have
Enjoyment 67% (12) 17% (3) 17% (3) been so motivated. Regardless, it seems the games did not have a
Honesty 39% (7) 28% (5) 33% (6) positive effect on them.

4.3 Inferential Statistics


We first checked the counterbalance and performed between-subjects
Mann-Whitney U tests on all the scores with respect to the order. We
more positive feedback and involved interaction with an NPC and
did not find any statistical significance difference between the order,
exploration, these outcomes are not unexpected. Similarly, due to
which suggests that it did not matter if players first played a game
its more complex design, we see that participants find Game–2 less
version and then the standard version, or vice versa. Following this,
easy to learn and the story less easy to follow. The most interesting
we found that the enjoyment across all participants in the game
difference is that participants expressed to find Game–1 (Mdn = 2)
versions (Mdn = 2.5, IQR = 2.0-3.0) was higher than the standard
more motivating than Game–2 (Mdn = 3).
version (Mdn = 3.0, IQR = 3.0-3.0). The result of Wilcoxon paired
4.1.3 Comparison Questionnaire. The majority (83%) believed that rank test was significant, Z = -1.8, p = .047, r = .30. By considering
the game could capture the essence of the test at least a moderate the player profile, specifically their frequency of play, we further
amount; 61% strongly or somewhat preferred the game version to observed that participants who reported playing games more than
the paper version while 22% had neither preference; and 67% re- zero hours weekly (n = 13) had a higher level of enjoyment with the
ported they had more enjoyment in the game while 16.67% enjoyed game versions (Mdn = 2.0, IQR = 1.5-3.0) than the standard version
both experiments equally. The number of participants who pre- (Mdn = 3.0, IQR = 3.0-3.0). The respective Wilcoxon paired samples
ferred the game over the paper for writing more honestly (39%) was rank test was reported at Z = -2.4, p = .015, r = .40. We found no
higher compared to people who had an opposite preference (28%). significant results between all other paired variables: length, writ-
A third (33%) had no preference. In terms of writing in more detail, ing in detail, writing honestly, number of unique themes, number
two participants (11%) had no preference between the settings and of total themes, detail score, and word count.
the rest were equally spread between the paper and game versions
(44%). Regarding the length, 39% rated the length to be equal while 5 DISCUSSION
the same percentage considered the game versions to be lengthier. For our study, we did not provide any form of compensation or
The preferences are shown in Table 3. extrinsic motivation for participation. Thus, we may expect that
Gamifying Psychological Assessment FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA

Table 4: Summary of the number of (unique) themes, detail score and word count.

Standard Version Game Versions


ID U T D, # D, M (SD) WC, # WC, M (SD) V U T D, # D, M (SD) WC, # WC, M (SD) % Standard % Similarity
P03 4 6 6 2.0 (0.0) 226 75.3 (11.5) 2 4 7 4 1.3 (0.6) 270 90.0 (31.2) 50 50
P04 10 11 4 1.3 (0.6) 304 101.3 (66.2) 2 6 11 4 1.3 (0.6) 526 175.3 (31.6) 44 44
P07 4 5 8 2.7 (0.6) 288 96.0 (15.6) 2 6 8 7 2.3 (0.6) 322 107.3 (16.5) 50 33
P09 4 6 5 1.7 (0.6) 273 91.0 (12.5) 2 6 7 4 1.3 (0.6) 139 46.3 (9.0) 75 50
P13 5 8 7 2.3 (0.6) 300 100.0 (32.4) 1 10 13 8 2.7 (0.6) 650 216.7 (79.0) 80 40
P20 3 5 2 0.7 (0.6) 194 64.7 (19.0) 2 6 8 4 1.3 (0.6) 138 46.0 (11.5) 100 50
P22 5 8 7 2.3 (0.6) 396 132.0 (33.8) 1 3 6 8 2.7 (0.6) 181 60.3 (15.0) 40 40
P29 7 7 2 0.7 (0.6) 231 77.0 (34.8) 2 3 3 0 0.0 (0.0) 92 30.7 (10.3) 43 43
P32 6 6 3 1.0 (0.0) 136 45.3 (16.8) 2 3 4 0 0.0 (0.0) 41 13.7 (0.6) 50 50

Note 1: The abbreviations for the headers are as follows: U: number of unique themes; T: number of total themes; D, #: total detail score; D, M (SD): mean
and standard Deviation of detail score; WC, #: sum of word count; and WC, M (SD): mean and standard deviation of word count; V: Game Version.
Note 2: # of unique themes is the count of unique themes appeared among the three stories for each setting. # of total themes include the repeated elements.
Note 3: The % Standard is calculated by dividing the number of similar codes between the standard and the game versions by the total number of codes in
the standard version; the % similarity, instead, is calculated by the total number of similar codes divided by all the possible codes retrieved from both the
standard and the game versions.

the participants were at least to some extent intrinsically motivated other possible factors that result from gamifying a psychological
to participate. Looking at the participants’ self-report on both the assessment help to increase the response volume (H2) and quality
standard and game versions and their actual responses, it shows (H3) is something that this work cannot support. There were no
that the participants took the experiment seriously and put in the significant differences in the volume or quality of response across
effort that was asked from them regardless what version they did. conditions as measured in participants’ self-report of writing in
Therefore, any sign of increased motivation (H1), volume (H2), and detail and honesty as well as the extracted detail score, word count,
quality (H3) in the game conditions can be attributed to gamifying and number of themes. At the same time, this also suggests that
the psychological assessment. gamifying a test does not necessarily have to lead to a decrease in
volume and quality. In fact, the relatively high similarity (40%)—
5.1 Effectiveness of Gamifying given that we used different cards and only a few—between what
is found through a standard test and with the gamified versions,
Our results indicate that the participants at large significantly en-
indicates that the same result can be obtained in a gamified manner,
joyed the game versions more and specifically participants who
with the additional benefit of it being more enjoyable.
play games (with medium effect sizes). Although the game versions
It is difficult, if not impossible, to tell whether the traditional or
were much lengthier than the standard version, more than half of
the gamified approach can be a better “true” reflection of a person.
the participants either found them equally lengthy or found the
Our take on this ground truth problem was to acknowledge that
standard version even lengthier. This suggests that the gamified
we could not make a final determination, but that we critically
versions were successful in engaging participants.
examined the differences in outcomes and showed gamifying can
Additionally, while no statement about enjoying the experience
increase motivation. One other possible approach can be inclusion
was reported in the open text responses of the standard version,
of the test results from another comparable psychological test in
participants expressed their enjoyment with the game versions with
the study to base the ground truth, then check to see which ver-
statements such as “I enjoyed the game and depicting the story
sion could capture a person’s personality more accurately. Good
what must have been going on and what could be possibility [sic]
examples of such tests for this aim are the Minnesota Multipha-
of happening in future,” or “His misery and not loving his life was
sic Personality Inventory (MMPI)[32] and MMPI-2 [19]. However,
funny!! :)).” One participant summarized his experience as:
inclusion of such tests to an already prolonged study can be chal-
“The game version was more fun to play and lenging. Future work can investigate how to address this.
made it more clear what I am asked to do. In the
[standard] version I was not sure, should I write 5.2 Role of Design
a story or explain the picture. Game version
While improvements to the game versions may be necessary, given
was more successful since I had to complete a
that participants did not overwhelmingly express that they found
scenario...Having different options for dialogue
the games to be motivating or aesthetically pleasing, they were
was really cool.”
generally positive on many other aspects and indicated that they
Given this, we found evidence to support the hypothesis that took the task serious. From the open responses it became clear
gamifying psychological assessments increases participants’ mo- that there were some usability (“After being asked to write some-
tivation to participate (H1). Whether this increased motivation or thing the prompt was removed, so I couldn’t readdress the prompt
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA Fatehi et al.

and ensure I was writing the right thing”) and design issues (“The played the game versions first and then the standard version after.
options in dialogues were meaningless” or “the inconsistent charac- As within-subjects designs suffer from novelty and practice effects,
ters threw me out of it”), but generally participants found that the we were not able to fully rule out these effects. A number of partic-
game versions work well: “Game seemed to me pretty reasonable” ipants were also not fluent in English and expressed difficulty with
or “Nah everything was fine. Even the awkward walk of the main this task:
character lol.” However, consistent with the results obtained, not “The only problem was about the explaining
everyone was convinced it would work better in terms of response the scenes in English because it is my second
volume/quality: “The game version was really funny to me, but the language in some cases I should search for the
[standard] version is probably the better version.” proper word and because of that sometimes I
Thus, in terms of design, while our work does not provide any just give up to write about some details.”
conclusive evidence, it does indicate that it matters. There are subtle
We also acknowledge that asking participants at the end—with
differences between the two versions, such as with the mood meter,
the comparison questionnaire—if they preferred the gamified ver-
that can be conjectured to be a result of the differences in design.
sion over the standard version may have biased them. However,
The most striking result, however, is with the input method. The
we want to point out that the preferences correspond with the indi-
detail score for Game–1 ranged consistently from 2 to 3 while for
vidual post-questionnaire results and with other observations, all
Game–2 the detail score had all possible values of 0 to 3. In fact,
which support that the enjoyment in the game versions was higher
most participants in Game–2 gave brief answers and almost none
than with the standard version. This increase in enjoyment is ob-
came up with a fully structured story. Future work should look at
served in much of the gamification literature [24, 39] and therefore
these design features more systematically and include them as part
consistent with our findings.
of an experimental design.
As for our questionnaires specifically, they were designed to
Another suggestion that our work provides insight into in terms
address specific questions we had in mind, while keeping the total
of design, and that may explain why there may not be an effect
number of questions to a minimum. For this reason, we decided not
on response volume/quality, is that the effect of gamifying differs
to make use of validated survey instruments at this stage. Aside
per person. The tool we used (StudyCrafter), is particularly suited
from incorporating such instruments in the future, for example
for narrative-driven games. Therefore, both of our designs ended
the often used Intrinsic Motivation Inventory (IMI), scholars may
up being narrative-based. Our results highlight, for example, cases
want to consider what other measures are imaginable for response
where one person doubles their amount of writing in the gamified
volume and quality. Our work suggests that there are many more
version, and vice versa. With this possibility, it is important to think
factors involved than engagement/motivation, but there may also
about how to personalize the experience [30] and consider how
be better ways to measure these constructs.
the design features may be more or less appealing to certain par-
Finally, it is an open question how the gamification of psycho-
ticipants. There are, unfortunately, very few guidelines on this [2].
logical assessments scales. In the actual test in a clinical setting,
Our work, where we attempted to gamify according to player moti-
psychologists use at least 10-12 cards, whereas we only used three
vations (achiever, explorer, socializer), suggests that this may come
cards per condition.
at certain trade-offs—for example, where an increase in immersion
comes at making it less easy to follow the story.
6 CONCLUSION
In this paper, we evaluated the effectiveness of gamifying psycho-
5.3 Limitations of the Study logical assessments by exposing participants to gamified versions
This is an exploratory study with a small sample size. While the of the Thematic Apperception Test (TAT), a robust and frequently
choice to focus on a within-subjects experiment allowed us to run used projective technique in psychology, and a traditional version.
the study with a small number of participants, it limited us in The results indicate that gamifying helps to increase participants’
determining any possible between-subjects effects or to test the motivation to participate, but it does not necessarily lead to a higher
differences of the two game versions. However, this was not our response volume and quality.
explicit goal. We mainly aimed to study the difference between a Our study is an initial step toward gamifying psychological as-
gamified and a traditional test. All other data was exploratory and sessment. Future work can focus on exploring design variations, per-
should be more seen as grounds for future work rather than as sonalizing experiences, scaling up the work with a larger and wider
strong evidence supporting gamifying psychological assessments. range of participants, providing design assistance to researchers
Additionally, as a result of a technical error, we retrieved only half interested to gamify their assessments, finding a solution for the
of the TAT data, which limited us in drawing many inferences from ground truth problem, and alternative measures to measure moti-
this specific data. For example, while the difference in the detail vation and quality.
score between Game–1 and Game–2 is striking, we only have data
for two players for Game–1 and seven for Game–2, which inherently ACKNOWLEDGMENTS
limits us in making bold assertions on the input method. Future We would like to thank the StudyCrafter team for their techni-
research should provide more clarity on this and other matters. cal support and the student volunteers who participated in this
The study may have also been biased in several ways. The design study. We would also like to thank Professor Randy Colvin and
ended up being imbalanced because six participants did not show William Sharp from the Department of Psychology at Northeastern
up for phase two, which in particular meant that more participants University who helped with the psychological materials.
Gamifying Psychological Assessment FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA

REFERENCES [28] Casper Harteveld, Sam Snodgrass, Omid Mohaddesi, Jack Hart, Tyler Corwin,
[1] Pippa Bailey, Gareth Pritchard, and Hollie Kernohan. 2015. Gamification in and Guillermo Romera Rodriguez. 2018. The Development of a Methodology for
market research: Increasing enjoyment, participant engagement and richness of Gamifying Surveys. In Proceedings of the 2018 Annual Symposium on Computer-
data, but what of data validity? International Journal of Market Research 57, 1 Human Interaction in Play Companion Extended Abstracts. ACM, 461–467.
(2015), 17–28. [29] Casper Harteveld and Steven C Sutherland. 2015. The goal of scoring: Exploring
[2] Sander Bakkes, Chek Tien Tan, and Yusuf Pisan. 2012. Personalised gaming: the role of game performance in educational games. In Proceedings of the 33rd
a motivation and overview of literature. In Proceedings of The 8th Australasian annual ACM Conference on Human Factors in Computing Systems. ACM, 2235–
Conference on Interactive Entertainment: Playing the System. ACM, 4. 2244.
[3] Richard Bartle. 1996. Hearts, clubs, diamonds, spades: Players who suit MUDs. [30] Casper Harteveld and Steven C Sutherland. 2017. Personalized gaming for
Journal of MUD Research 1, 1 (1996), 19. motivating social and behavioral science participation. In Proceedings of the 2017
[4] Benoit Bediou, Deanne M Adams, Richard E Mayer, Elizabeth Tipton, C Shawn ACM Workshop on Theory-Informed User Modeling for Tailoring and Personalizing
Green, and Daphne Bavelier. 2018. Meta-analysis of action video game impact on Interfaces. ACM, 31–38.
perceptual, attentional, and cognitive skills. Psychological Bulletin 144, 1 (2018), [31] Casper Harteveld, Steven C Sutherland, and Gillian Smith. 2015. Design consid-
77. erations for creating game-based social experiments. In ACM CHI 2015 Work-
[5] Jim Blascovich and Jeremy Bailenson. 2011. Infinite reality: Avatars, eternal life, shop, Researching Gamification: Strategies, Opportunities, Challenges, Ethics. Seoul,
new worlds, and the dawn of the virtual revolution. William Morrow & Co. South Korea.
[6] Walter R Boot. 2015. Video games as tools to achieve insight into cognitive [32] Starke Rosecrans Hathaway and John Charnley McKinley. 1951. Minnesota
processes. Frontiers in Psychology 6 (2015), 3. Multiphasic Personality Inventory; Manual, revised. (1951).
[7] Sheri Coates Broussard and ME Betsy Garrison. 2004. The relationship between [33] Christoffer Holmgård, Julian Togelius, and Lars Henriksen. 2016. Computational
classroom motivation and academic achievement in elementary-school-aged intelligence and cognitive performance assessment games. In Computational
children. Family and Consumer Sciences Research Journal 33, 2 (2004), 106–120. Intelligence and Games (CIG). IEEE, 1–8.
[8] James Dean Brown. 2009. Open-response items in questionnaires. In Qualitative [34] RR Holt. 1999. Empiricism and the Thematic Apperception Test: Validity is the
Research in Applied Linguistics. Springer, 200–219. payoff. Evocative images: The Thematic Apperception Test and the art of projection
[9] Alessandro Canossa, Jeremy B Badler, Magy Seif El-Nasr, Stefanie Tignor, and (1999), 99–105.
Randy C Colvin. 2015. In Your Face (t) Impact of Personality and Context on [35] Douglas Northrop Jackson. 1974. Personality research form manual. research
Gameplay Behavior.. In Foundations of Digital Games 2015 (FDG15). psychologists press.
[10] Jared Cechanowicz, Carl Gutwin, Briana Brownell, and Larry Goodfellow. 2013. [36] Daniel Johnson, Ella Horton, Rory Mulcahy, and Marcus Foth. 2017. Gamifica-
Effects of gamification on participation and data quality in a real-world market tion and serious games within the domain of domestic energy consumption: A
research domain. In Proceedings of the first International Conference on Gameful systematic review. Renewable and Sustainable Energy Reviews 73 (2017), 249–264.
Design, Research, and Applications. ACM, 58–65. [37] Marshall B Jones, Robert S Kennedy, and Alvah C Bittner Jr. 1981. A video game
[11] E Paul Cherniack. 2011. Not just fun and games: applications of virtual reality in for performance testing. The American Journal of Psychology (1981), 143–152.
the identification and rehabilitation of cognitive disorders of the elderly. Disability [38] Kristian Kiili, Keith Devlin, Arttu Perttula, Pauliina Tuomi, and Antero Lindstedt.
and Rehabilitation: Assistive Technology 6, 4 (2011), 283–289. 2015. Using video games to combine learning and assessment in mathematics
[12] Mihaly Csikszentmihalyi. 1997. Flow and the psychology of discovery and invention. education. International Journal of Serious Games 2, 4 (2015), 37–55.
Harper Perennial. [39] Jonna Koivisto and Juho Hamari. 2019. The rise of motivational information
[13] Sebastian Deterding. 2015. The lens of intrinsic skill atoms: A method for gameful systems: A review of gamification research. International Journal of Information
design. Human–Computer Interaction 30, 3-4 (2015), 294–335. Management 45 (2019), 191–210.
[14] Sebastian Deterding, Dan Dixon, Rilla Khaled, and Lennart Nacke. 2011. From [40] Yubo Kou and Xinning Gui. 2014. Playing with strangers: understanding tempo-
game design elements to gamefulness: defining gamification. In Proceedings of rary teams in league of legends. In Proceedings of the first ACM SIGCHI Annual
the 15th International Academic MindTrek Conference. ACM, 9–15. Symposium on Computer-Human Interaction in Play. ACM, 161–169.
[15] NL Dosajh. 1996. Projective techniques with particular reference to inkblot test. [41] Sven Laumer, Alexander von Stetten, Andreas Eckhardt, and Tim Weitzel. 2009.
SIS Journal of Projective Psychology & Mental Health 3, 1 (1996), 38. Online gaming to apply for jobs-the impact of self-and e-assessment on staff
[16] V Mark Durand, Edward B Blanchard, and Jodi A Mindell. 1988. Training in recruitment. In 2009 42nd Hawaii International Conference on System Sciences.
projective testing: Survey of clinical training directors and internship directors. IEEE, 1–10.
Professional Psychology: Research and Practice 19, 2 (1988), 236. [42] Scott O Lilienfeld, James M Wood, and Howard N Garb. 2000. The scientific
[17] Magy Seif El-Nasr, Matt Gray, Truong-Huy Dinh Nguyen, Derek Isaacowitz, Elin status of projective techniques. Psychological Science in the Public Interest 1, 2
Carstensdottir, and David DeSteno. 2014. Social Gaming as an Experimental (2000), 27–66.
Platform. [43] Jemma Looyestyn, Jocelyn Kernot, Kobie Boshoff, Jillian Ryan, Sarah Edney,
[18] Tom Ewing. 2012. Four types of gamification that can be used in and Carol Maher. 2017. Does gamification increase engagement with online
market research. http://blackbeardblog.tumblr.com/post/13452542524/ programs? A systematic review. PloS One 12, 3 (2017), e0173403.
state-of-play-four-types-of-research-gamification [44] Regan L. Mandryk, Max V. Birk, Adam Lobel, Marieke van Rooij, Isabela Granic,
[19] J.R. Graham. 1993. MMPI-2: Assessing Personality and Psychopathology. Oxford and Vero Vanden Abeele. 2017. Games for the Assessment and Treatment of
University Press. Mental Health. In Extended Abstracts Publication of the Annual Symposium on
[20] Wayne D Gray. 2017. Game-XP: Action Games as Experimental Paradigms for Computer-Human Interaction in Play (CHI PLAY ’17 Extended Abstracts). ACM,
Cognitive Science. Topics in Cognitive Science 9, 2 (2017), 289–307. New York, NY, USA, 673–678. https://doi.org/10.1145/3130859.3131445
[21] Robert J Gregory. 2004. Psychological testing: History, principles, and applications. [45] Amir Matallaoui, Jonna Koivisto, Juho Hamari, and Ruediger Zarnekow. 2017.
Allyn & Bacon. How effective is “exergamification”? A systematic review on the effectiveness of
[22] Frédéric Guay, Julien Chanal, Catherine F Ratelle, Herbert W Marsh, Simon gamification features in exergames. In Proceedings of the 50th Hawaii International
Larose, and Michel Boivin. 2010. Intrinsic, identified, and controlled types of Conference on System Sciences 2017. University of Hawai’i at Manoa.
motivation for school subjects in young elementary school children. British [46] Gregory J Meyer. 2004. The reliability and validity of the Rorschach and The-
Journal of Educational Psychology 80, 4 (2010), 711–735. matic Apperception Test (TAT) compared to other psychological and medical
[23] Theo Downes-Le Guin, Reg Baker, Joanne Mechling, and Erica Ruyle. 2012. Myths procedures: An analysis of systematically gathered evidence. Comprehensive
and realities of respondent engagement in online surveys. International Journal Handbook of Psychological Assessment 2 (2004), 315–342.
of Market Research 54, 5 (2012), 613–633. [47] Henry Alexander Murray. 1938. Explorations in personality. (1938).
[24] Juho Hamari, Jonna Koivisto, and Harri Sarsa. 2014. Does gamification work?– [48] Henry Alexander Murray. 1943. Thematic Apperception Test. (1943).
a literature review of empirical studies on gamification. In 2014 47th Hawaii [49] Lennart E Nacke and Christoph Sebastian Deterding. 2017. The maturing of
International Conference on System Sciences (HICSS). IEEE, 3025–3034. gamification research. Computers in Human Behavior (2017), 450–454.
[25] Johannes Harms, Christoph Wimmer, Karin Kappel, and Thomas Grechenig. 2014. [50] Northeastern Game Studio. 2017. StudyCrafter. https://studycrafter.com
Gamification of online surveys: conceptual foundations and a design process [51] Chris Piotrowski, Ronald W Belter, and John W Keller. 1998. The impact of
based on the MDA framework. In Proceedings of the 8th Nordic Conference on managed care on the practice of psychological testing: Preliminary findings.
Human-Computer Interaction. ACM, 565–568. Journal of Personality Assessment 70, 3 (1998), 441–447.
[26] Casper Harteveld. 2011. Triadic game design: Balancing reality, meaning and play. [52] Chris Piotrowski and Christine Zalewski. 1993. Training in psychodiagnostic
Springer Science & Business Media. testing in APA-approved PsyD and PhD clinical psychology programs. Journal
[27] Casper Harteveld, Nolan Manning, Farah Abu-Arja, Rick Menasce, Dean of Personality Assessment 61, 2 (1993), 394–405.
Thurston, Gillian Smith, and Steven C Sutherland. 2017. Design of playful author- [53] Richard M Ryan and Edward L Deci. 2000. Intrinsic and extrinsic motivations:
ing tools for social and behavioral science. In Proceedings of the 22nd International Classic definitions and new directions. Contemporary Educational Psychology 25,
Conference on Intelligent User Interfaces Companion. ACM, 157–160. 1 (2000), 54–67.
FDG ’19, August 26–30, 2019, San Luis Obispo, CA, USA Fatehi et al.

[54] Richard M Ryan, C Scott Rigby, and Andrew Przybylski. 2006. The motivational
pull of video games: A self-determination theory approach. Motivation and
Emotion 30, 4 (2006), 344–360.
[55] Carol Sansone and Judith M Harackiewicz. 2000. Intrinsic and extrinsic motivation:
The search for optimal motivation and performance. Elsevier.
[56] Lamyae Sardi, Ali Idri, and José Luis Fernández-Alemán. 2017. A systematic
review of gamification in e-Health. Journal of Biomedical Informatics 71 (2017),
31–48.
[57] David Williamson Shaffer and James Paul Gee. 2012. The right kind of GATE:
Computer games and the future of assessment. Technology-Based Assessments for
21st Century Skills: Theoretical and Practical Implications From Modern Research
(2012), 211–228.
[58] Praveen Shrestha. 2017. Detailed procedure of Thematic Apper-
ception Test. https://www.psychestudy.com/general/personality/
detailed-procedure-thematic-procedure-test
[59] Valerie J Shute. 2011. Stealth assessment in computer-based games to support
learning. Computer Games and Instruction 55, 2 (2011), 503–524.
[60] Valerie J Shute and Robert Glaser. 1990. A large-scale evaluation of an intelligent
discovery world: Smithtown. Interactive Learning Environments 1, 1 (1990), 51–77.
[61] Steven C Sutherland, Casper Harteveld, Gillian Smith, Joseph Schwartz, and
Cigdem Talgar. 2015. Exploring digital games as a research and educational
platform for replicating experiments. In NEDSI Conference.
[62] Gustavo F Tondello, Rina R Wehbe, Lisa Diamond, Marc Busch, Andrzej Mar-
czewski, and Lennart E Nacke. 2016. The gamification user types hexad scale.
In Proceedings of the 2016 Annual Symposium on Computer-Human Interaction in
Play. ACM, 229–243.
[63] Tiffany Tong, Mark Chignell, Phil Lam, Mary C Tierney, and Jacques Lee. 2014.
Designing serious games for cognitive assessment of the elderly. In Proceedings
of the International Symposium on Human Factors and Ergonomics in Health Care,
Vol. 3. SAGE Publications Sage India: New Delhi, India, 28–35.
[64] Tiffany Tong, Mark Chignell, Mary C Tierney, and Jacques Lee. 2016. A serious
game for clinical assessment of cognitive status: validation study. JMIR Serious
Games 4, 1 (2016).
[65] Jukka Vahlo, Johanna K Kaakinen, Suvi K Holm, and Aki Koponen. 2017. Digital
game dynamics preferences and player types. Journal of Computer-Mediated
Communication 22, 2 (2017), 88–103.
[66] Jukka Vahlo, Jouni Smed, and Aki Koponen. 2018. Validating gameplay activity
inventory (GAIN) for modeling player profiles. User Modeling and User-Adapted
Interaction 28, 4-5 (2018), 425–453.
[67] Giel Van Lankveld, Sonny Schreurs, Pieter Spronck, and Jaap Van Den Herik.
2010. Extraversion in games. In International Conference on Computers and Games.
Springer, 263–275.
[68] Giel Van Lankveld, Pieter Spronck, Jaap Van den Herik, and Arnoud Arntz. 2011.
Games as personality profiling tools. In Computational Intelligence and Games
(CIG). IEEE, 197–202.
[69] Harald Warmelink, Jonna Koivisto, Igor Mayer, Mikko Vesa, and Juho Hamari.
2018. Gamification of the work floor: A literature review of gamifying production
and logistics operations. (2018).
[70] C Edward Watkins, Vicki L Campbell, Ron Nieberding, and Rebecca Hallmark.
1995. Contemporary practice of psychological assessment by clinical psycholo-
gists. Professional Psychology: Research and Practice 26, 1 (1995), 54.
[71] Nick Yee. 2006. Motivations for play in online games. CyberPsychology & behavior
9, 6 (2006), 772–775.

View publication stats

You might also like