When S A Story A Story Determining Interpretability of Social Cognition and

Received: 29 December 2019 Revised: 4 February 2020 Accepted: 27 February 2020
DOI: 10.1002/cpp.2442
RESEARCH ARTICLE
When's a story a story? Determining interpretability of Social

Cognition and Object Relations Scale-Global ratings on
Thematic Apperception Test narratives
Michelle B Stein1 | Solara Calderon2 | Jared Ruchensky1 | Christina Massey1 |

Jenelle Slavin-Mulford3 | Wei-Jean Chung1 | Laura A. Richardson1 | Mark A. Blais1
1
Department of Psychiatry, Massachusetts
General Hospital and Harvard Medical School, The Social Cognition and Object Relations Scale-Global Rating Method (SCORS-G)
Boston, MA, USA
measures the quality of object relations in narrative material. The reliability and valid-
2
Mental Health Service, VA San Diego
Healthcare System, San Diego, CA, USA ity of this measure have been well established. However, a psychometric oddity of
3
Department of Psychology, Augusta this scale is that default ratings are given to select dimensions when the relevant con-
University, Augusta, GA, USA
struct is not present. This can result in narrative ‘blandness’ and may impact clinical
Correspondence findings. The aim of these two studies is to understand these phenomena both psy-
Michelle B. Stein, PhD, Psychological
chometrically and clinically. In the first study, we identified 276 outpatients who had
Evaluation and Research Laboratory (PEaRL),
Massachusetts General Hospital and Harvard SCORS-G ratings for TAT Cards 1, 2, 3BM, and 14, set criteria for narrative ‘bland-
Medical School, One Bowdoin Square, 7th
ness’ across all eight dimensions, and examined group differences. In Study 2, we
floor, Boston, MA 02114, USA.
Email: mstein3@mgh.harvard.edu used a subset (N = 99) of Study 1 and examined how percentage of formal default
ratings for Emotional Investment in Values and Moral Standards (EIM), Experience
and Management of Aggressive Impulses (AGG), Self-Esteem(SE), and Identify and
Coherence of Self (ICS) impacted robustness of correlations across tests of intelli-
gence, psychopathology, and normal personality functioning. Taken together, we
identified clinical characteristics of patients who are more likely to produce ‘bland’
narratives and increased percentages of formal default ratings. Also, an excess of
default ratings per protocol impacts robustness of correlations and weakens signifi-
cant correlations. As cut-off scores increase (>25% and >28.12%), the likelihood of
being able to interpret EIM, AGG, SE, and ICS decreases. Psychometric and clinical
implications are discussed.
KEYWORDS
default ratings, object relations, performance-based tasks, SCORS-G, social cognition, TAT
1 | I N T RO D UC TI O N Standards (EIM), Understanding of Social Causality (SC), Experience

and Management of Aggressive Impulses (AGG), Self-Esteem(SE), and
The Social Cognition and Object Relations Scale-Global(SCORS-G) Identity and Coherence of Self (ICS). The SCORS-G can be applied to
rating method (Stein & Slavin-Mulford, 2018) is a clinician-rated mea- several narrative sources, the most popular being Thematic Appercep-
sure that assesses object relational processes, which mediate interper- tion Test (TAT; Murray, 1943) narratives.
sonal functioning: Complexity of Representations of People (COM), Although reliability, validity, and clinical utility of this measure
Affective Quality of Representations (AFF), Emotional Investment in have been well established (Stein & Slavin-Mulford, 2018), psycho-
Relationships (EIR), Emotional Investment in Values and Moral metric oddities are also evidenced. First, no specific criteria have been
Clin Psychol Psychother. 2020;27:567–580. wileyonlinelibrary.com/journal/cpp © 2020 John Wiley & Sons, Ltd. 567
568 STEIN ET AL.
set to determine whether there is sufficient information in a narrative

to code it with the SCORS-G. All narratives are coded regardless of Key Practitioner Message
richness of object relational content or narrative length. Second, some
• Patients who produce ‘bland’ SCORS-G ratings on TAT
SCORS-G dimensions possess ‘default rating rules’. These rules require
narratives report higher symptoms associated with drug
the rater to assign a specific score when a narrative lacks scorable
use, somatization, and cognitive features of depression,
content for that particular dimension. For example, when narratives
indicating that these patients may be more emotionally
contain no aggressive content, raters assign a ‘default’ rating of four
shut down and cognitively or physically preoccupied.
for AGG. When a protocol only contains a single individual, raters
• Percentage of default ratings per protocol for SCORS-G
assign a two for EIR.
dimensions EIM, AGG, SE, and ICS have an impact on
Default rating rules can sometimes contribute to challenges. For
robustness of correlations across measures of intelligence
some dimensions, a dimension's default rating rule and a rating based
and psychopathology. As cut-off scores increase (>25%
on the actual narrative may result in the same score with two mean-
and >28.12%), the likelihood of being able to interpret
ings. For example, raters assign a score of four for EIM based on con-
EIM, AGG, SE, and ICS decreases.
tent of the moral quality of a narrative (i.e., scored based on content)
• Patients with externalizing psychopathology were more
or because the narrative contained moral concerns that was bland
likely to produce TAT protocols with ≤25% and ≤28.12%
(i.e., scored based on the default rule for this dimension).
per four card protocol.
Although default rating rules create problems, they are necessary
• This study emphasizes the importance of examining the
as some narratives are ‘bland’, meaning they contain little ratable con-
percentage of default ratings per TAT protocol when
tent. However, the SCORS-G does not provide raters with methods
interpreting test findings.
for identifying bland, unusual, or questionable narratives or protocols.
Instead, default rating rules are given in these situations. When an
individual produces several bland narratives, their score will be driven
more by default rules (and less by the actual content of the narra-
tives). In research studies, this may attenuate associations and reduce process. The first is the Core Conflictual Relationship Theme Method
effect sizes. In clinical practice, a SCORS-G profile based on several (CCRT; Luborsky & Crits-Christoph, 1998) and the second is the Com-
bland narratives may not accurately capture the patient's functioning. prehensive System (Exner, 2003) for the Rorschach Inkblot Test
Currently, it is unclear how frequently bland protocols occur or the (Rorschach, 1942). The CCRT identifies maladaptive relationship pat-
extent to which they may impact findings. The purpose of this paper terns derived from relationship episodes (RE) elicited from psycho-
is to begin to understand and address these issues. therapy narratives and narratives told during the Relationship
Numerous TAT rating systems have been developed. One of the Anecdotes Paradigm Interview. The completeness of RE varies widely,
reasons for this is that there is not a unified scoring system to rate the though criteria to identify RE specify that the interaction must have
TAT, as there is, for instance, with the Rorschach Inkblot Test (a) a beginning, middle, and end; (b) an interpersonal wish (e.g., to be
(Exner, 2003). Jenkins (2008) edited a comprehensive text detailing loved), response of other (e.g., rejecting), and response of self
these TAT scoring systems. In reviewing these diverse scoring sys- (e.g., disappointment); and (c) a clear detailing of the patient's interac-
tems, all highlighted their reliability and administration procedures tion with the other person. After the RE has been identified, at least
(i.e., flexibility of card set, number of cards required, type of adminis- two judges are asked to rate the RE on its degree of completeness
tration, and use of prompts). Some scoring systems examined relation- from 1 (least) to 5 (most). The cut-off for inclusion is ≥2.5. Thus, there
ships between scoring system constructs, word count, and/or Lambda is specific criteria to identify both a RE and CCRT, in addition to
(Exner, 2003) on the Rorschach (Dana, 2008; Ritzler, 2008; Ronan, ensuring that the CCRT is rich enough to include in analyses.
Gibbs, Dreer, & Lombardo, 2008) and how this may or may not impact Similarly, in Exner's Comprehensive System, there are criteria to
scoring. determine when there is sufficient information to code the Rorschach
Most scoring systems have dichotomous as opposed to dimen- Inkblot Test (i.e., no card rejection; minimum of 14 responses). Also,
sional variables. If a construct is not present, it is not scored. There is Lambda is used to determine the extent to which an individual pro-
not a middle category rating like there is with the SCORS-G. In duced a meaningful and interpretable profile (>1.00 reflects a guarded
reviewing these manuals, there is one manual (Karon, 2008) that con- profile, and interpretations should be reviewed with caution).
siders the ratio between scorable and unscorable items in generating a To summarize, in reviewing TAT scoring system manuals, test
final score. However, there is very little discussion in these manuals as administration is considered in determining the scoring of narratives.
to whether or not there is enough scorable material present in a pro- There is at least one scale that accounts for unscorable narratives
tocol to interpret and later include in analyses. Reviewing this range of (Karon, 2008) in determining an overall rating. Most TAT rating sys-
diverse scoring systems highlights the variability in studying the TAT tems are dichotomous, so if a select construct is not present, it is not
and the difficulty in comparing findings across research studies. scored; this contrasts with the SCORS-G that is dimensional and as
Reviewing the literature on narrative-based measures proved such might not come up as a potential problem in other rating sys-
fruitful. We found two clinician-rated systems that quantify this tems. Despite this, there is little information surrounding the extent to
STEIN ET AL. 569
which a protocol provides sufficient information (i.e., there is enough (Hilsenroth, Stein, & Pinsker, 2007). Good (>.60) to excellent (>.74)
of a construct present in the protocol within and across cards) to reliability was attained by both raters in previous research (Stein
interpret the rating(s) obtained. Examining the literature on narrative- et al., 2016).
based measures at large provided some examples on how scale devel-
opers produced empirical-based methods for determining interpret-
ability of protocols. 2.1.3 | Materials and measures
The goals of this project are multi-fold. In Study 1, we want to
understand how frequently ‘bland’ narratives (i.e., as specified by Thematic Apperception Test (TAT; Murray, 1943)
select scoring on all dimensions) occur in our dataset. Then, we want Outpatients completed a standard TAT protocol as part of their rou-
to examine clinical differences between patients who tend to produce tine clinical assessment (see Stein, Slavin-Mulford, Sinclair, Siefert, &
‘bland’ narratives with those that do not. Study 2 will use a subset of Blais, 2012, Stein et al., 2016, for administration, scoring, and reliabil-
Study 1 to delve into how default ratings scattered throughout a TAT ity training). For this study, we used an abbreviated four card protocol:
protocol impact clinical findings. (1) (A young boy is contemplating a violin which rests on a table in
front of him.); (2) (Country Scene: in the foreground is a young woman
with books in her hand, in the background a man is working in the
2 | STUDY 1 fields and an older woman is looking on); (3BM) (On the floor against
a couch is the huddled form of a boy with his head bowed on his right
The aims of Study 1 are to identify the frequency of ‘bland’ narratives arm. Beside him on the floor is a revolver); and (14) (The silhouette of
in our dataset and examine differences in patient characteristics for a man [or woman] against a bright window. The rest of the picture is
those that produced ‘bland’ narratives versus those that did not using totally black) (Murray, 1943).
a multi-method approach.
Social Cognition and Object Relations Scale-Global Rating Method
(SCORS-G; Stein & Slavin-Mulford, 2018; Westen, 1995)
2.1 | Method The SCORS-G contains eight dimensions underlying object related-
ness, scored on a 7-point anchored scale. COM evaluates the pres-
2.1.1 | Participants ence, degree, and differentiation of the self and other. AFF examines
the emotional and perceptual lens through which one views their envi-
Participants were 276 outpatients who were referred for psycholog- ronment. EIR assesses the depicted level of intimacy and emotional
ical assessment in the department of psychiatry within an academic sharing. EIM measures how a person views others and acts in relation
medical centre in the north-eastern United States between 2008 to morality and compassion for others. SC evaluates the extent to
and 2018. The average age was 41 years (SD = 15.1), and the aver- which the person understands human behaviour, as well as the narra-
age education level was 14.6 (SD = 2.9) years. There were slightly tive's coherence and reasoning. AGG explores the person's ability to
more males (n = 144; 52%) than females (n = 132; 48%). Patients tolerate and manage aggression, and SE and ICS assess self-worth as
were predominately single (55%) with 26% married and 12% well as the degree to which a person has an integrated sense of who
divorced. Patients were primarily White (86%) with 5.5% self- he or she is. In Study 1, we set ‘bland’ criteria for all eight dimensions.
identifying as African American, 1.8% as Asian, and 4.8% as His-
panic. The most common referral diagnoses were depressive (42%), Personality Assessment Inventory (PAI; Morey, 1991)
bipolar (18%), and anxiety disorders (17%). The PAI is a 344 item self-report measure that assesses a broad range
of psychological constructs. It is a reliable and valid measure of psy-
chopathology (Morey, 1991, 1996). In Study 1, we used Negative
2.1.2 | Procedure Impression Management (NIM), Positive Impression Management
(PIM), PAI Clinical Scales (Somatization [SOM], Anxiety [ANX],
Our assessment clinic maintains a data repository approved by the Anxiety-Related Disorders [ARD], Depression [DEP], Mania [MAN],
institutional review board, which contains assessment and demo- Paranoia [PAR], Schizophrenia [SCZ], Borderline features [BOR], Anti-
graphic data for all evaluations conducted. Outpatients received a bat- social features [ANT], Alcohol Use [ALC], and Drug Use [DRG]) and
tery of standard clinical instruments, including the TAT. The vast the Mean Clinical Elevation (MCE) score which is the average of the
majority of patients completed their psychological assessments in a 11 clinical scales.
single session. After the TAT cards were administered, they were tran-
scribed (written verbatim as patient was creating narrative), NEO Five-Factor Inventory (NEO-FFI; Costa &
deidentified, and scored by two independent raters using the SCORS- McCrae, 1989, 1992b)
G. The SCORS-G was used to rate all TAT narratives from each The NEO-FFI is a 60-itemself-report measure of personality traits
patient. Raters (M. Stein and J. Slavin-Mulford) are expert raters and within the general population. This scale measures the ‘Big Five’
previously completed manualized training on the SCORS-G dimensions (as defined by Costa & McCrae, 1989, 1992b):
570 STEIN ET AL.
Neuroticism (N), Extraversion (E), Openness (O), Agreeableness (A), provided when affective quality is absent or limited. This rating of
and Conscientiousness (C). Correlations between the NEO-FFI and 4, although technically a default rating rule, fits between the anchor
NEO-PI-R scales range from .77 to .92. Internal consistency values points of 3 (largely unpleasant) and 5 (mixed, though some positive is
range from .68 to .86 (Costa & McCrae, 1992a, 1992b). In Study 1, we necessary) in theoretically consistent ways. For this study, we did not
used Neuroticism to denote emotional distress. want to include only a rating of 4 as a default rating rule because
many of these cards pull for some negative affect, which cannot be
Rorschach Inkblot Test (Rorschach, 1921/1942) accurately reflected in a rating of 4. To account for this, we chose to
The Rorschach Inkblot test is a performance-based measure used to include the range of 3 to 4. This accounts for the patients who
assess psychological functioning. In this study, the Rorschach was depicted some painful affect but not too much which would be less
administered and scored using the Exner Comprehensive Sys- reflective of a ‘bland’ rating. For EIR, a default rating rule of 2 is given
tem (2003; see Malone et al., 2013, for our lab scoring process). For when there is only one character depicted. For one-person cards
this study, we used Lambda and Number of Responses to capture (1, 3BM, and 14), we included this rating of 2. For both one- and
patient's level of engagement/productivity. multiple-person cards (1, 2, 3BM, and 14), we also included the rating
of 3 as in our clinical experience ‘bland’ narratives usually involve
Wechsler Abbreviated Scale of Intelligence-2nd edition (WASI-II; patients describing characters in superficial ways. COM and SC do not
Wechsler, 2011) have formal default ratings rules. However, based on clinical observa-
The WASI-II is a brief measure of intellectual functioning. It consists tion, we have found that a rating of 3 reflects minimal information
of four subscales, which produce Verbal Comprehension Index, Per- surrounding internal states and self/other understanding.
ceptual Reasoning Index, and Full Scale IQ. The WASI-II has excellent Using our clinical judgement, scores in the following ranges for
internal consistency and test–retest reliability. The average stability respective dimensions were considered indicators of possible ‘bland-
coefficients range from .90 to .96 for the Composites ness’: COM (2.5–3.5), AFF (3–4), EIR (2 [one person cards] and 3 [all
(Wechsler, 2011, p. 117). In Study 1, we used Full Scale IQ (FSIQ), cards]), EIM (4), SC (2.5–3.5), AGG (4), SE (4), ICS (4.5–5). A narrative
Verbal Comprehension Index (VCI), and Perceptual Reasoning Index response was considered ‘bland’ when it fulfilled the above criteria.
(PRI). After ‘bland’ narratives were identified based on the above criteria,
we reviewed these narratives to ensure they reflected ‘bland’
Psychiatric Diagnostic Screening Questionnaire (PDSQ; responses and not based on narrative content (i.e., AGG rating of a
Zimmerman & Mattia, 2001a, 2001b) 4 represents content present in narrative as opposed to this rating
The PDSQ is a self-report measure of psychiatric symptoms that being reflective of absence of content). When this occurred, these
screens for common Axis I disorders (American Psychiatric Associa- narratives were moved from the ‘bland’ to ‘non-bland’ group.
tion, 1994). It yields 13 subscales (Depression, Post-Traumatic Stress,
Bulimia, Obsessive–Compulsive, Panic, Psychosis, Agoraphobia, Social
Phobia, Alcohol, Drug, Generalized Anxiety, Somatization, and Hypo- 2.2.3 | Group differences
chondriasis). Subscales have good to excellent test–retest reliability
ranging from .61 to .93 and high internal consistency (.66 to .94; Per- As can be observed in Table 1, 78% (N = 215) of protocols did not evi-
key et al., 2017). dence any ‘bland’ narratives; 18% of protocols exhibited one ‘bland’
narrative and 2.9% exhibited two ‘bland’ narratives per protocol. Less
than 1% of protocols exhibited greater than two ‘bland’ narratives.
2.2 | Results When examining group differences, ‘bland’ protocols were identified
as having less than or equal to one bland narrative (22% of sample).
2.2.1 | SCORS-G interrater reliability We examined group differences on tests of intellectual ability,
psychopathology, and performance-based measures of personality
Interrater reliability for this sample is detailed elsewhere (Stein (Table 2). Chi-squares were calculated to assess group differences for
et al., 2016). ICC (2.2) were reported to be in the ‘excellent’ range
(>.74; Shrout & Fleiss, 1979).
TABLE 1 Number of ‘bland’ narratives per four card protocol in
Study 1
2.2.2 | Identifying ‘bland’ narratives Number N Percentage of current sample
0 215 78
First, we identified SCORS-G dimensions that had a formal default
1 50 18
rating rules (EIM, AGG, and SE, ICS). EIM, AGG, and SE are provided
2 8 2.9
default rating rules of 4, and ICS is given a default rating rule of
3 2 <1
5 when dimensional content is not present. AFF and EIR were more
4 1 <1
challenging to categorize. Regarding AFF, a default rating rule of 4 is
STEIN ET AL. 571
gender. There was no significant difference between the ‘bland’ and There were no significant group differences in age, education
‘non-bland’ group for gender, X2 = (1, N = 276) = 2.26, p = .13. level, intellectual resources, impression management, or general dis-
Independent t-tests were conducted to examine group differ- tress. Lambda and Number of Responses were trending toward signif-
ences for the following demographics: age and level of education icance. The ‘bland’ protocol group exhibited higher Lambda and lower
(Table 2). number of responses, both of which are indicative of a guarded and
TABLE 2 Study 1 group differences
Demographics Group N M SD T p-value D

Age 0 215 40.9 15.5 0.34 n.s. .05
1 61 40.2 13.8
Educational level 0 210 14.7 2.7 1.32 n.s. .20
1 58 14.1 3.5
WASI-II
Full scale 0 211 104.2 15.7 0.84 n.s. .12
1 60 102.3 17.2
Verbal comprehension 0 210 105.2 16.8 0.65 n.s. .10
1 60 103.7 15.7
Perceptual reasoning 0 210 101.8 14.6 0.97 n.s. .14
1 60 99.7 17.2
Rorschach
Lambda 0 107 .73 .7 −1.66 .10a .34
1 32 1.0 1.3
Number of responses 0 107 21.8 6.8 1.67 .10a .34
1 32 19.6 6.1
TAT word count
Card 1 0 209 129.1 65.1 1.56 n.s. .23
1 57 114.0 63.2
Card 2 0 209 145.0 69.3 2.57 .01b .38
1 57 119.7 51.3
Card 3BM 0 209 109.8 55.5 3.33 .001b .50
1 57 83.8 38.0
Card 14 0 209 100.3 55.4 2.97 .003b .44
1 57 76.9 39.9
NEO-FFI
Neuroticism 0 170 63.7 12.5 −0.75 n.s. .13
1 41 65.3 12.4
PAI
MCE 0 201 59.6 8.6 −1.47 n.s. .22
1 57 61.5 8.6
NIM 0 206 61.0 14.9 −1.36 n.s. .20
1 60 61.1 16.8
PIM 0 206 42.0 11.6 1.34 n.s. .20
1 60 39.6 12.5
Note. 0 = non-bland group; 1 = bland group; n.s. = non-significant; WASI-II = Wechsler Abbreviated Scale of Intelligence-2nd edition; Lambda and Number
of Responses = Engagement and Richness; TAT word count = Verbal Productivity; PAI = Personality Assessment Inventory; MCE = Mean Clinical Elevation;
NIM = Negative Impression Management; PIM = Positive Impression Management. Cohen's d was calculated to determine effect size (d > .20–.50 = small,
.50–.80 = medium, and >.80 = large; Cohen, 1988) for these group comparisons.
a
t ≤ .10.
b
≤.01.
572 STEIN ET AL.
TABLE 3 Study 1 group differences: select psychopathology
PAI Group N M SD T p-value D

SOM 0 206 60.4 13.2 −2.32 .02 *
.34
1 60 64.8 12.3
ANX 0 206 66.2 13.6 −1.62 n.s. .24
1 60 69.5 14.5
ARD 0 206 62.5 16.2 −0.92 n.s. .13
1 60 64.6 13.2
DEP 0 206 71.2 15.6 −2.13 .03* .31
1 60 75.9 14.4
MAN 0 206 51.5 12.0 0.81 n.s. .12
1 60 50.1 11.0
PAR 0 206 56.9 12.0 0.07 n.s. .01
1 60 56.8 12.6
SCZ 0 206 62.6 13.7 −0.64 n.s. .09
1 60 63.9 13.0
BOR 0 206 63.9 12.8 −1.30 n.s. .20
1 60 66.4 12.9
ANT 0 206 52.4 11.5 −1.52 n.s. .22
1 60 55.0 12.0
ALC 0 206 54.0 12.3 0.27 n.s. .04
1 60 53.5 12.7
DRG 0 206 54.7 12.9 −3.06 .002** .45
1 60 60.9 16.7
SOM-H 0 202 60.9 13.1 −2.59 .01** .39
1 58 65.9 12.9
SOM-C 0 202 56.7 14.2 −0.36 n.s. .36
1 58 57.5 12.5
SOM-S 0 202 59.1 12.6 −1.49 n.s. .22
1 58 61.9 13.0
DEP-C 0 202 69.7 16.2 −2.24 .03* .33
1 58 75.2 17.7
DEP-A 0 202 71.2 16.5 −1.74 .08t .26
1 58 75.4 15.3
DEP-P 0 202 63.3 12.5 −0.70 n.s. .10
1 58 64.6 12.4
PDSQ
Depression 0 113 8.9 5.0 −1.80 .07* .38
1 29 10.8 4.2
Drug Use 0 113 0.4 1.3 −2.36 .02* .49
1 29 1.2 2.0
Somatization 0 113 1.3 1.4 −2.02 .04* .42
1 29 1.9 1.4
Hypochondriasis 0 113 0.7 1.3 −2.64 .01** .55
1 29 1.5 1.9
Note. 0 = non-bland group; 1 = bland group; PAI = Personality Assessment Inventory; SOM = Somatization; ANX = Anxiety; ARD = Anxiety-Related Disor-
ders; DEP = Depression; MAN = Mania; PAR = Paranoia; SCZ = Schizophrenia; BOR = Borderline features; ANT = Antisocial features; ALC = Alcohol Use;
DRG = Drug Use; SOM-H = Somatization-Health Concerns subscale; SOM-C: Somatization-Conversion; SOM-S = Somatization-Somatization;
DEP-C = Depression-Cognitive; DEP-A = Depression-Affective; DEP-P = Depression-Physical; PDSQ = Psychiatric Diagnostic Screening Questionnaire.
Cohen's d was calculated to determine effect size (d > .20–.50 = small, .50–.80 = medium, and >.80 = large; Cohen, 1988) for these group comparisons.
t = p ≤ .10.
*
p < .05.
**
p ≤ .01.
STEIN ET AL. 573
less meaningful profile. There were significant differences in verbal 3 | S TUD Y 2

productivity for Cards 2, 3BM, and 14. The ‘bland’ group's word count
was significantly lower than the ‘non-bland’ group for these cards. The aims of Study 2 were to identify the frequency of formal default
Although there were no significant group differences on mea- ratings in our database. In addition, we wanted to assess how percent-
sures of general distress, results indicated significant group differ- age of default ratings scattered throughout a TAT protocol impacts
ences for SOM, DEP, and DRG such that those in the ‘bland’ group clinical findings, using a multi-method approach.
reported higher rates of Somatization, Depression, and Drug Use.
Examination of the PAI subscales reveals that Somatization-Health
Concerns is driving SOM's elevation where those in the ‘bland’ group 3.1 | Method
are reporting increased health concerns. Regarding DEP, Depression-
Cognitive Features is driving this elevation such that those in the 3.1.1 | Participants
‘bland’ group are reporting increased cognitive features of depression,
and Depression-Affective Features (DEP-A) is trending toward signifi- In Study 2, we used a subset of patients from Study 1. A power analy-
cance (Table 3). sis was conducted, which determined that we needed at least
Lastly, we assessed group differences using the PDSQ to see if 85 patients to yield a moderate effect size. A random number genera-
similar findings emerged. We used scales targeting Drug Use, Depres- tor identified TAT protocols that included Cards 1, 2, 3BM, and 14.
sion, and Physical Health. Consistent with the PAI, the ‘bland’ group The final number of patients for Study 2 was N = 99. The average age
reported higher rates of Drug Use, Somatization, and Hypochondria- was 39.5 years (SD = 16.2), and the average education level was
sis. Depression was trending toward significance. 14.6 years (SD = 2.6). There were slightly more males (n = 52; 52%)
than females (n = 47; 47%). Patients were predominately single (64%)
with 17% married and 10% divorced. Patients were primarily White
2.3 | Discussion (82%) with 6% self-identifying as African American, 3% as Asian, and
4% as Hispanic. The most common referral diagnoses were depressive
Study 1's first goal was to assess the frequency of ‘bland’ narratives in (38%), bipolar (23%), and anxiety disorders (18%).
our database. The second goal was to examine clinical differences
between those individuals who produced ‘bland’ narratives and those
who did not. ‘Bland’ narratives were not as common as we anticipated 3.1.2 | Procedure
at the narrative (22%) or protocol level (18% exhibited one ‘bland’ nar-
rative per protocol). We took a more of a conservative approach in studying ‘bland’ proto-
However, clinical differences were found between patients who cols in Study 2 and focused exclusively on the four SCORS-G dimen-
produced ‘bland’ narratives versus those who did not. Clinically, sions that possess formal default ratings (EIM, AGG, SE, and ICS).
patients who produced ‘bland’ narratives tended to produce less ver- After TAT protocols were identified using the random number
biage on Cards 2, 3BM, and 14. There was not any significant differ- generator, we read through them to ensure that ratings for EIM,
ence in verbiage between groups on Card 1 (first card administered). AGG, and SE, and ICS represented a default rating and not reflec-
This indicates that there may be something about progression of cards tive of a middle rating based on content expressed in the narrative.
and the increasingly emotionally arousing nature of the task that con- Ratings that were not reflective of default ratings were excluded.
tributes to decreased verbal productivity in the ‘bland’ group. Another To determine specific cut-offs, we took our sample mean and
possibility is that Card 1 may produce an anhedonic stance toward median into consideration and making sure we had sufficient num-
the task. ber of protocols per group. Ultimately, we examined the utility of
Although no significant difference was found between the two two cut scores to identify questionable protocols, specifically eight
groups in terms of general distress, examining self-reported symptoms and nine default ratings per four card protocol. We then converted
of psychopathology illuminated that the ‘bland’ group endorsed higher these raw scores into percentages. Percentage of default ratings per
rates of somatization/health concerns, drug use, and features of protocol consists of the number of default ratings for EIM, AGG,
depression. Taken together, these findings suggest that those who SE, and ICS divided by 32 [8 SCORS-G dimensions × 4 TAT
produce ‘bland’ protocols may be more ruminative and preoccupied cards = 32]. Eight default ratings per protocol converts to 25%
over their health and self-efficacy and depressed. They may use drugs (e.g., 8 divided by 32 = .25), and nine default ratings per protocol
to avoid focusing on their internal world, be emotionally shut down, converts to 28.12%.
and produce narratives that are simplistic and devoid of interpersonal
and emotional influences.
The clinical implications of these findings are useful, as this infor- 3.1.3 | Materials and measures
mation can help increase our patient understanding and formulate
treatment goals when working with patients and scoring a TAT proto- Thematic Apperception Test
col that meets ‘bland’ criteria. Same as Study 1. We also examined TAT word count.
574 STEIN ET AL.
Social Cognition and Object Relations Scale-Global TABLE 4 Frequency of default ratings per four TAT card protocol
In Study 2, we examined EIM, AGG, SE, and ICS. for Study 2
Number of default ratings Default percentage Subset

Wechsler Abbreviated Scale of Intelligence-2nd edition per protocol per protocol frequency
In Study 2, we used VCI and PRI. 2 6.25 2
4 12.50 5
Personality Assessment Inventory 5 15.62 7
Given that Study 2 is a subset of Study 1, we needed to be mindful of 6 18.75 5
the measures and scales we used in analyses (to ensure we had ade-
7 21.87 9
quate power). As such, for the PAI, we focused on ‘bigger variables’
8 25.00 14
and opted to configure scales into higher order dimensions. Following
9 28.12 6
the Hopwood and Moser (2011) work, we calculated an Internalizing
10 31.25 15
dimension by averaging ANX, ARD, and DEP full scales and an Exter-
11 34.37 7
nalizing dimension by averaging ANT, ALC, and DRG full scales. To
create a reality-impairing dimension, we rationally selected PAR and 12 37.50 13
SCZ full scales and averaged them. We used MCE to denote general 13 40.62 5
distress. We included Borderline Features (BOR) given that the 14 43.75 9

SCORS rating system was initially created to assess presence of per- 15 46.90 1
sonality pathology, particularly Borderline Personality Disorder. 16 50.00 1
NEO Five-Factor Inventory

In Study 2, we used Neuroticism, Extraversion, Openness, Agreeable- similarity in patterns of correlations across groups. This approach
ness, and Conscientiousness. helps determine the degree to which default ratings may interfere
with correlational findings.
3.2 | Results
3.2.3 | Intellectual ability
3.2.1 | Frequency of default ratings
Total sample
Our first goal was to identify the frequency of default ratings for EIM, In the total sample, there were significant positive correlations
AGG, SE, and ICS in this subset (Table 4). The mean percentage of between VCI and SE (p = .03) and AGG and PRI (p = .05) (Tables 5 and
default ratings was 29.1% (SD = 9.8). This is equivalent to nine to 6).
10 default ratings per protocol. The median and mode were 31.25%
(10 default ratings per protocol). 25% cut-off
Significant WASI-II correlations were most prevalent in protocols that
exhibited ≤25% of default ratings (Table 5). VCI was significantly posi-
3.2.2 | Robustness of correlations tively related to AGG (p = .03), SE (p = .02), and ICS (p < .01) and PRI
was significantly positively related to EIM (p = .01), AGG (p < .01), SE
Our second goal was to ascertain the extent to which percentage of (p = .05), and ICS (p = .04).
default ratings per protocol impacts robustness of correlations across
tests of intelligence, psychopathology, and normal personality. 28.12% cut-off
Tables 5 and 6 illustrate correlations across three groups: total sample, Significant correlations were most evidenced in protocols that
subset of patients who exhibited ≤25% (Table 5) or ≤28.12% (Table 6) exhibited ≤28.12% of default ratings (Table 6). VCI was significant
default ratings, and subset of patients who exhibited >25% (Table 5) positively related to AGG (p = .03), SE (p < .01), and ICS (p < .01), and
or >28.12% (Table 6) default ratings per protocol. PRI was significantly positively related to EIM (p < .01), AGG (p < .01),
We also computed double-entry intraclass correlations, which are and SE (p = .04).
Pearson correlations between two double entered profiles, and pro-
vided an index of global similarity between two profiles (Furr, 2010).
In this study, we were interested in comparing each subset with the 3.2.4 | Psychopathology
total sample. First, we examined profile similarities between the total
sample with the subset that did not exceed the cut-off. Second, we Total sample
examined profile similarities between the total sample with the subset Significant PAI correlations were most prevalent in the total sample
that did exceed the cut-off. Stronger associations indicate greater (Tables 5 and 6). EIM was significantly negatively associated with
STEIN ET AL. 575
TABLE 5 Percentage of default ratings (cut-off: 25%) and correlational differences for WASI-II, PAI, and NEO-FFI
EIM AGG SE ICS
ALL ≤25 >25 ALL ≤25 >25 ALL ≤25 >25 ALL ≤25 >25
WASI-II
VCI .06 .22 −.11 .07 .33* −.19 .22* .35* .19 .12 .40** −.14
PRI .16 .38 *
−.05 .20 *
.45 **
.01 .18 .30 *
.14 .10 .32 *
−.12
PAI
Internalizing −.03 −.07 −.02 −.02 −.06 −.01 −.31** −.42** −.26* −.25* −.38* −.18
Externalizing −.27 **
−.24 −.11 −.24 *
−.22 −.08 −.21 *
−.20 −.05 −.22 *
−.17 −.12
Reality-impairing −.21 *
−.26 −.11 −.19 −.23 −.11 −.29 **
−.36 *
−.18 −.27 **
−.46 **
.03
MCE −.22* −.25 −.10 −.20 −.23 −.09 −.35** −.41** −.25 −.29** −.42* −.06
BOR −.13 −.16 −.12 −.11 −.13 −.08 −.28 **
−.41 *
−.14 −.26 *
−.43 **
−.04
NEO-FFI
N −.11 −.15 −.22 .05 −.05 .15 −.21 −.32 −.17 −.15 −.27 −.06
E −.07 .08 −.19 −.13 .02 −.25 .35 **
.52 **
.36 *
.36 **
.55 **
.36*
O −.13 −.09 −.12 −.16 −.04 −.26 .16 .30 .20 .00 .27 −.19
** * ** * *
A .36 .45 .20 .30 .36 .15 .20 .22 .08 .24 .26 .14
C .04 .04 .17 −.02 −.03 .09 .09 .26 −.06 .11 .17 .15
TAT word count .01 .28 .13 −.04 .18 .06 .03 .26 .12 −.05 .20 −.04
Note. N = 99; ≤25% = 42; >25% = 57; EIM = Emotional Investment in Values and Moral Standards; AGG = Experience and Management of Aggressive
Impulses; SE = Self-Esteem; ICS = Identity and Coherence of Self; WASI-II = Wechsler Abbreviated Scale of Intelligence-2nd Edition; VCI = Verbal Compre-
hension Index; PRI = Perceptual Reasoning Index; PAI = Personality Assessment Inventory; MCE = Mean Clinical Elevation; BOR = Borderline Features;
NEO-FFI = NEO Five Factor Inventory Short Form; N = Neuroticism; E = Extraversion; O = Openness to Immediate Experience; A = Agreeableness;
C = Conscientiousness;
*
p ≤ .05 small effect size.
**
p < .01.
externalizing (p < .01) and reality-impairing (p = .04) psychopathology (p = .01 and p = . < .01). One significant negative correlation was
as well as MCE (p = .03). AGG was significantly negatively associated evidenced between SE and internalizing psychopathology in the
with externalizing psychopathology (p = .01). SE and ICS were signifi- >28.12% group (p = .04).
cantly negatively associated with internalizing (p < .01 and p = .01,
respectively), externalizing (p = .04 and p = .03, respectively), and
reality-impairing (p < .01 and p < .01, respectively) psychopathology in 3.2.5 | Normal personality traits
addition to MCE (p < .01 and p < .01, respectively) and borderline fea-
tures (p < .01 and p = .01, respectively). Total sample
EIM, AGG, and ICS were positively associated with Agreeableness
25% cut-off (p < .01, p < .01, and p = .04, respectively) (Tables 5 and 6). SE and ICS
In the ≤25% cut-off group, SE and ICS were significantly negatively were positively associated with Extraversion (p < .01 and p < .01).
related to internalizing (p < .01 and p = .01, respectively) and reality-
impairing (p = .02 and p < .01, respectively) psychopathology, general 25% cut-off
distress (MCE; p < .01 and p = .01, respectively), and borderline fea- EIM and AGG were significantly positively associated with Agreeable-
tures (p = .01 and p < .01, respectively) (Table 5). One significant nega- ness (p = .01 and p = .04, respectively) in protocols that exhibited
tive correlation was evidenced between SE and internalizing ≤25% of default ratings (Table 5). SE and ICS were significantly posi-
psychopathology in the >25% group (p = .05). tively associated with extraversion in the ≤25% group (p < .01 and
p < .01, respectively) and >25% group (p = .02 and p = .02,
28.12% cut-off respectively).
Identical findings were evidenced using this cut-off(Table 6). In the
≤28.12% group, SE and ICS were significantly negatively associated 28.12% cut-off
with internalizing (p < .01 and p < .01, respectively), reality-impairing Identical findings were evidenced using this cut-off(Table 6). EIM and
psychopathology (p < .01 and p < .01, respectively), general distress AGG were significantly positively associated with Agreeableness
(MCE; p < .01 and p < .01, respectively), and borderline features (p < .01 and p = .02, respectively) in protocols that exhibited ≤28.12%
576 STEIN ET AL.
TABLE 6 Percentage of default ratings (cut-off: 28.12%) and correlational differences for WASI-II, PAI, and NEO-FFI
EIM AGG SE ICS
ALL ≤28.12 >28.12 ALL ≤28.12 >28.12 ALL ≤28.12 >28.12 ALL ≤28.12 >28.12
WASI-II
VCI .06 .24 −.17 .07 .31* −.19 .22* .37** .13 .12 .39** −.17
PRI .16 .37 **
−.19 .20 *
.43 **
−.07 .18 .30 *
.05 .10 .26 −.11
PAI
Internalizing −.03 −.09 −.01 −.02 −.09 −.03 −.31** −.40** −.29* −.25* −.38** −.15
Externalizing −.27 **
−.24 −.08 −.24 *
−.21 −.07 −.21 *
−.16 −.12 −.22 *
−.15 −.19
Reality-impairing −.21* −.28 −.09 −.19 −.27 −.06 −.29** −.38** −.14 -.27** −.46** .07
MCE −.22 *
−.27 −.06 −.20 −.26 −.03 −.35 **
−.40 **
−.18 −.29 **
−.42* −.01
BOR −.13 −.16 −.12 −.11 −.15 −.05 −.28 **
−.36 *
−.26 −.26 *
−.42 **
−.05
NEO-FFI
N −.11 −.11 −.20 .05 −.04 .27 −.21 −.22 −.26 −.15 −.18 −.13
E −.07 .02 −.18 −.13 −.01 −.30 .35 **
.41 **
.46 **
.36 **
.47** .39*
O −.13 −.07 −.15 −.16 −.06 −.27 .16 .28 .16 .00 .17 −.17
** ** ** * *
A .36 .46 .05 .30 .38 .02 .20 .24 .01 .24 .25 .13
C .04 −.01 .19 −.02 −.01 −.05 .09 .10 .07 .11 .07 .24
TAT word count .01 .17 .13 −.04 .10 .02 .03 .13 .12 −.05 .05 −.02
Note. N = 99; ≤28.12% = 48; >28.12% = 51; EIM = Emotional Investment in Values and Moral Standards; AGG = Experience and Management of Aggres-
sive Impulses; SE = Self-Esteem; ICS = Identity and Coherence of Self; WASI-II: Wechsler Abbreviated Scale of Intelligence-2nd Edition; VCI = Verbal Com-
prehension Index; PRI = Perceptual Reasoning Index; PAI = Personality Assessment Inventory; MCE = Mean Clinical Elevation; BOR = Borderline Features;
NEO-FFI = NEO Five Factor Inventory Short Form; N = Neuroticism; E = Extraversion; O = Openness to Immediate Experience; A = Agreeableness;
C = Conscientiousness;
*
p ≤ .05 small effect size.
**
p < .01.
of default ratings. SE and ICS were significantly positively associated pattern of correlations between the total sample and subset which
with extraversion in the ≤28.12% group (p < .01 and p < .01, respec- exceeds the cut-off is highly dissimilar. Taken together, these results
tively) and >28.12% group (p < .01 and p = .02, respectively). indicate that having 9 (>25%) or more (>28.12%) formal default rat-
ings per protocol can exert a notable influence on the pattern of cor-
TAT word count relations for EIM, AGG, and ICS in the total sample, though SE does
There were no significant relationships between TAT word count not follow this same pattern.
and SCORS-G dimensions across the three groups for both cut-
offs.
3.2.7 | Clinical differences
3.2.6 | Double-entry intraclass correlations Binary logistic regressions were calculated to assess whether there
were clinical differences in patients that produced ≤25% and ≤28.12%
Comparing the total sample with those who fell below the 25% cut- default ratings to those that exhibited >25% and >28.12% default rat-
off, agreement was high for all dimensions (EIM = .83; AGG = .76; ings. The number of regressions were computed based on conceptu-
SE = .91; ICS = .81). With the exception of ICS, results were similar ally based combinations of variables. The following clinical themes
for the 28.12% cut-off group (EIM = .87; AGG = .81; SE = .95; were used in separate analyses: intellectual ability, impression man-
ICS = .60). This means that the pattern of correlations between the agement, general distress, higher-order psychopathology, and verbal
total sample and those who did not exceed the cut-off demonstrated productivity (mean TAT word count). For ease of comparison and to
strong similarity in profiles for most dimensions. reduce multicollinearity, these clinical variables were transformed into
Comparing the total sample and the subset of patients who did a similar metric (z-scores; M = 0, SD = 1). With respect to the depen-
exceed default ratings cut-off indicated differences in the pattern of dent variable, a coding of 0 was provided to patients who produced
correlations (with the exception of SE) across both the 25% cut-off ≤25% or ≤28.12% default ratings per protocol and a coding of 1 was
(EIM = .64; AGG = .58; SE = .90; ICS = .56) and the 28.12% cut-off provided to patients who provide >25% or >28.12% default ratings
(EIM = .26; AGG = .30; SE = .89; ICS = .62). This indicates that the per protocol.
STEIN ET AL. 577
25% cut-off 3.3 | Discussion

Verbal productivity significantly predicted classification of those with
>25% default ratings per protocol, χ 2(1) = 8.93, p = .00 (Table 7). No Study 2's first objective was to identify the frequency of default rat-
significant predictions were found for tests of intellectual ability, ings per a four TAT card protocol. The second objective was to iden-
χ 2(2) = .38, n.s., general distress, χ 2(1) = .56, n.s., and impression man- tify a reasonable cut-off based on our first goal and examine how the
agement, χ2(2) = .27, n.s. Regarding dimensional models of psychopa- percentage of default ratings may impact clinical findings using a
thology, although the overall model did not significantly predict multi-method approach. The frequency of default ratings ranged con-
percentage of default rating per protocol, χ2(3) = 6.94, p = .07, the siderably. We used these findings to identify reasonable cut-off points
externalizing dimension was a significant independent predictor and assist in ascertaining our second objective: the extent to which
(B = −.48, p = .04) such that patients who reported greater externaliz- the magnitude of correlations is impacted by the percentage of
ing psychopathology were more likely to produce ≤25% default rat- default ratings per protocol.
ings per protocol. The total sample and those who did not exceed the cut-off
exhibited the greatest number of significant correlations. The total
28.12% cut-off sample had the most significant correlations; however, this is likely
Verbal productivity significantly predicted classification of those due to the fact that this group had a higher N. In some instances, the
with >28.12% default ratings per protocol, χ2(1) = 5.19, p = .02 correlations for these two groups were similar, but because of the
(Table 8). The dimensional models of psychopathology also signifi- higher N, the correlations were significant in the total group and not
cantly predicted classification of those with ≤28.12% group, in the group that did not exceed the cut-off. In other instances, both
χ (3) = 8.36, p = .04. In particular, patients who reported greater
2
groups resulted in significant correlations, with the correlations being
externalizing psychopathology were more likely to produce ≤28.2% consistently larger in the group that did not exceed the cut-off (≤25%
default ratings per protocol (B = −.57, p = .02). No significant pre- and ≤28.12% default ratings per protocol).
dictions were evidenced on tests of intellectual ability, χ 2(2) = .72, On the WASI-II, significant relationships between patients' verbal
n.s., general distress, χ (1) = .02, n.s., and impression management,
2
comprehension and perceptual reasoning abilities were evidenced in
χ 2(2) = .89, n.s. the total sample and groups that did not exceed the cut-off. In some
TABLE 7 Logistic regression results for predicting clinical TABLE 8 Logistic regression results for predicting clinical
differences in patients who produced less than and greater than 25% differences in patients who produced less than and greater than
default ratings per four card protocol 28.12% default ratings per 4 card protocol
Variable entered B SE Wald p-value Exp(B) Variable entered B SE Wald p-value Exp(B)
Intellectual ability (N = 97) Intellectual ability (N = 97)
VCI −.04 .30 .02 n.s. .96 VCI −.21 .30 .49 n.s. .81
PRI −.10 .30 .10 n.s. .91 PRI .05 .29 .03 n.s. 1.05
Constant .27 .20 1.74 n.s. 1.31 Constant .02 .20 .01 n.s. 1.02
Impression management (N = 97) Impression management (N = 97)
PIM −.09 .23 .16 n.s. .91 PIM −.11 .23 .23 n.s. .89
NIM .02 .24 .01 n.s. 1.03 NIM .11 .23 .24 n.s. 1.12
General distress (N = 75) General distress (N = 75)
Neuroticism .18 .24 .55 n.s. 1.19 Neuroticism .04 .23 .02 n.s. 1.04
Constant .24 .23 1.08 n.s. 1.27 Constant −.08 .23 .12 n.s. .92
Dimensional models (N = 97) Dimensional models (N = 97)
Internalizing .55 .36 2.28 n.s. 1.73 Internalizing .59 .36 2.67 n.s. 1.81
Externalizing −.48 .24 4.07 .04 .62 Externalizing −.57 .25 5.30 .02 .56
Reality-impairing −.44 .36 1.54 n.s. .64 Reality-impairing −.40 .36 1.25 n.s. .67
Verbal productivity (N = 97) Verbal productivity (N = 97)
TAT mean word count −.66 .24 7.68 .01 .51 TAT mean word count −.49 .23 4.71 .03 .61
Note. VCI = Verbal Comprehension Index; PRI = Perceptual Reasoning Note. VCI = Verbal Comprehension Index; PRI = Perceptual Reasoning
Index; PIM = Positive Impression Management; NIM = Negative Impres- Index; PIM = Positive Impression Management; NIM = Negative Impres-
sion Management. sion Management.
578 STEIN ET AL.
instances, significant relationships were lost in the total sample, and in Similar to Study 1, in Study 2, we found clinical differences
other instances, using the cut-off resulted in the strengthening of between patients who did and did not exceed the cut-off. Clini-
correlations. cally, individual differences like intellectual ability, impression man-
On the PAI, SE and ICS were the two dimensions that were most agement, general distress, internalizing, and reality-impairing
impacted by percentage of default ratings. Although both the total psychopathology did not relate to the percentage of default rating
sample and using the cut-off exhibited significant correlations, using group cut-offs. However, externalizing psychopathology was a sig-
the cut-off (≤25% and ≤28.12%) resulted in the strengthening of cor- nificant predictor. One of the reasons for this may be because the
relations for internalizing and reality-impairing psychopathology as majority of SCORS-G dimensions are theoretically (Stein & Slavin-
well as general distress and borderline features. Mulford, 2018) and empirically (DeFife, Goldberg, & Westen, 2013)
On the NEO-FFI, using the ≤25% and ≤28.12% cut-off resulted in connected to underlying processes associated with externalizing
the strengthening of correlations between Extraversion, SE, and ICS psychopathology.
as well as Agreeableness with EIM. Correlational findings were similar Given that word count was consistently linked to protocol ‘bland-
in the total sample and when using the cut-off between Agreeable- ness’ in Study 1 and the percentage of default ratings in Study 2, this
ness, AGG, and ICS (small effect sizes). brings up a broader topic within the TAT literature regarding how we
Findings from the binary logistic regression indicate that patients as clinicians and researchers can maximize protocol richness without
with greater rates of externalizing psychopathology are more likely to significantly disrupting the patient's flow of associations. TAT adminis-
fall into the ≤25% and ≤28.12% groups. An explanation for this is that tration guidelines vary widely (Jenkins, 2008). We use a standard set
externalizing psychopathology is composed of the PAI scales targeting of prompts during TAT administration (Stein et al., 2012), though there
Alcohol/Drug Use and Antisocial features. SCORS-G dimensions tap is room to think about a benefit to increasing the standardization of
underlying psychological processes associated with externalizing psy- re-prompting.
chopathology (i.e., morality and compassion, management of aggres- Based on these findings, we suggest interpreting protocols as
sion, and self-reactivity). These patients may be more likely pulled to follows: ≤25% (8 default ratings) = high likelihood of interpretabil-
produce TAT content with similar themes. Another possibility is that ity; >25% (9 or more) is questionable interpretability of four dimen-
this group is more disinhibited. sions; >28.12% (10 or more) = increasing concern that multiple
dimensions contain an excess of default ratings. However, it is
important to note that even at the ≤25% cut-off it is possible for
4 | GENERAL DISCUSSION the 8 default ratings to cluster in two dimensions. As such,
researchers and clinicians should be mindful of this when making a
In these studies, we took initial steps toward understanding narrative final determination. Likewise, protocols above these suggested cut
‘blandness’ and the extent to which percentage of formal default rat- score should not be automatically thrown out. Instead, scores
ings present in a protocol impacts clinical findings using a multi- above the cut score should cue clinicians or researchers to critically
method approach. Study 1 informed us that there are select patient examine the narrative and ratings. First, examine EIM, AGG, SE,
characteristics that contribute to them being more likely to produce and ICS to see if one or two dimensions are driving elevations. For
bland protocols using specified criteria across all eight dimensions example, if EIM and AGG are the dimensions that have all the
(reduced word count, increased self-reported symptoms of Somatiza- default ratings, then one can use SE and ICS for interpretation.
tion, Drug Use, and aspects of Depression), which is useful clinically. Second, we do not recommend using the composite rating if
This study also demonstrated that ‘bland’ narratives are not as com- default ratings exceed the cut-off. Third, the SCORS-G generates
mon as we initially thought. However, it was less informative about higher-order factors (Stein & Slavin-Mulford, 2018). In a two-factor
how formal default ratings psychometrically impact clinical findings. model, the affective-relational factor (AFF, EIR, EIM, AGG, SE, and
This contributed to the execution of Study 2 where we took more of ICS) may not be able to be used and the same applies in a three-
a conservative approach and studied the percentage of default ratings factor model when SE and ICS form a third factor. Fourth, COM,
per protocol. We used the four SCORS-G dimensions that had a for- AFF, EIR, and SC are not impacted by formal default ratings and
mal default rating rule. Our findings suggest that limiting the number can be interpreted. Findings from Study 1 can help inform select
of default ratings per protocol within study samples may produce psychopathology that can be contributing to protocol ‘blandness’.
more meaningful and accurate findings that are driven less by default The number of TAT cards and the specific TAT cards used var-
ratings and more by actual narrative content. The inclusion of ques- ies considerably with the SCORS-G field (and TAT field more
tionable protocols within a sample seemed to increase error and broadly). The findings from this study only apply to this four card
attenuate associations with other variables (i.e., associations between TAT protocol (Cards 1, 2, 3BM, and 14). Cut-off raw scores and
four SCORS-G dimensions and intellectual ability, SE and ICS with the calculation of percentage of default ratings were based on this.
select psychopathology and Agreeableness with EIM and Extraversion Replicating this study with increasing number of cards is important.
with SE and ICS). Thus, removing these protocols reduces measure- This will require researchers to identify reasonable cut-off raw
ment error and helps researchers identify and better characterize scores and then transform these scores into percentages. For
meaningful associations. example, in a seven card protocol, one would calculate the number
STEIN ET AL. 579
of default ratings divided by 56 [8 SCORS-G dimensions × 7 TAT across sex and ethnicity. Personality and Individual Differences, 50,
cards = 56] = percentage of default ratings per protocol. The maxi- 116–119.
Jenkins, S. R. (2008). A handbook of clinical scoring systems for thematic
mum number of default ratings using this approach is 28 (seven
apperception techniques. New York: Lawrence Erlbaum Associates.
cards × four dimensions). After this is calculated, researchers and Karon, B. P. (2008). Pathogenesis index. In S. R. Jenkins(Ed.), A handbook of
clinicians should review these protocols to ensure ratings reflect clinical scoring systems for thematic apperception techniques (pp. 347–
true default scores and not reflective of thematic content before 384). New York: Lawrence Erlbaum Associates.
Luborsky, L., & Crits-Christoph, P. (1998). Understanding transference: The
generating a final cut-off score for each protocol. As far as future
core conflictual relationship theme method (2nd ed.). Washington, DC:
research, we also encourage others to replicate this study (using American Psychological Organization.
this 4 card TAT protocol) to see if similar findings emerge in addi- Malone, J. C., Stein, M. B., Slavin-Mulford, J., Bello, I., Sinclair, S. J., & Blais,
tion to examining multiple cut-off points (to fine tune acceptable M. A. (2013). Seeing red: Affect modulation and chromatic color
responses on the Rorschach. Bulletin of the Menninger Clinic, 77(1), 70–
levels of interpretability). Lastly, examining how other types of nar-
93. https://doi.org/10.1521/bumc.2013.77.1.70
rative data impacts default ratings percentages is encouraged. We Morey, L. (1991). Personality assessment inventory professional manual.
do not suggest replicating this study with fewer than four cards Odessa, Fl: Psychological Assessment Resources.
(Stein & Slavin-Mulford, 2018). Morey, L. (1996). An interpretative guide to the personality assessment inven-
tory (PAI). Odessa, FL: Psychological Assessment Resources.
There are some study limitations that are worth discussing.
Murray, H. A. (1943). Manual for the thematic apperception test. Cambridge,
The total sample has approximately twice as many patients as each
MA: Harvard University Press.
cut-off group. SCORS-G ratings of TAT narratives are from one Perkey, H., Sinclair, S. J., Blais, M. A., Stein, M. B., Neal, P., Pierson, A. D. &
dataset. It is not an ethnically diverse sample. A select group of Slavin-Mulford, J. (2017). External Validity of the Psychiatric Diagnos-
patients are referred to our clinic, are often clinically complex, and tic Screening Questionnaire (PDSQ) in a Clinical Sample. Psychiatry
Research, https://doi.org/10.1016/j.psychres.2017.12.011
have medical comorbidities. Replicating this study with other sam-
Ritzler, B. (2008). The picture projection test. In S. R. Jenkins(Ed.), A hand-
ples is suggested. We could not explore the nuances of how cut- book of clinical scoring systems for thematic apperception techniques
offs below ≤25% and above the >28.12% impact clinical findings. (pp. 229–257). New York: Lawrence Erlbaum Associates.
Ronan, G. F., Gibbs, M. S., Dreer, L. E., & Lombardo, J. A. (2008). Personal
problem-solvingsystem-revised. In S. R. Jenkins(Ed.), A handbook of
ORCID
clinical scoring systems for thematic apperception techniques (pp. 181–
Michelle B Stein https://orcid.org/0000-0002-4179-0817 207). New York: Lawrence Erlbaum Associates.
Rorschach, H. (1942). Psychodiagnostics. Berne, Switzerland: Hans.
RE FE R ENC E S (Original work published 1921)
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing
American Psychiatric Association. (1994). Diagnostic and statistical manual
rater reliability. Psychological Bulletin, 86, 420–428. https://doi.org/10.
of mental disorders (4th ed.). Washington, DC: Author.
1037//0033-2909.86.2.420
Cohen, J. (1988). Statistical power analysis in the behavioral sciences (2nd
Stein, M. B., & Slavin-Mulford, J. (2018). The social cognition and object rela-
ed.) (pp. 36–37). Hillsdale, NJ: Erlbaum.
tions scale-global rating method (SCORS-G): A comprehensive guide for
Costa, P. T.Jr., & McCrae, R. R. (1989). The NEO–PI/NEO–FFI manual sup-
clinicians and researchers. New York, NY: Routledge.
plement. Odessa, FL: Psychological Assessment Resources.
Stein, M. B., Slavin-Mulford, J., Sinclair, S. J., Chung, W. J., Roche, M.,
Costa, P. T.Jr., & McCrae, R. R. (1992a). Multiple uses for longitudinal per-
Denckla, C., & Blais, M. A. (2016). Extending the use of the SCORS-G
sonality data. European Journal of Personality, 6, 85–102.
composite rating in assessing level of personality organization. Journal
Costa, P. T.Jr., & McCrae, R. R. (1992b). NEO PI–R professional manual:
of Personality Assessment, 100, 166–175. https://doi.org/10.1080/
Revised NEO Personality Inventory (NEO PI–R) and NEO Five-Factor
00223891.2016.1195394
Inventory (NEO–FFI). Lutz, FL: Psychological Assessment Resources.
Stein, M. B., Slavin-Mulford, J., Sinclair, S. J., Siefert, C. J., & Blais, M. A.
Dana, R. H. (2008). Manual for Objective TAT scoring. In S. R. Jenkins(Ed.),
(2012). Exploring the construct validity of the social cognition and
A handbook of clinical scoring systems for thematic apperception tech-
object relations scale in a clinical sample. Journal of Personality Assess-
niques (pp. 127–147). New York: Lawrence Erlbaum Associates.
ment, 94, 533–540.
DeFife, J. A., Goldberg, M., & Westen, D. (2013). Dimensional assessment
Wechsler, D. (2011). Wechsler abbreviated scale of intelligence-second edi-
of self and interpersonal functioning in adolescents: Implications for
tion (WASI-II). San Antonio, TX: NCS Pearson.
DSM-5's general definition of personality disorder. Journal of Personal-
Westen, D. (1995). Social cognition and object relations scale: Q-sort for pro-
ity Disorders, 27, 1–13.
jective stories (SCORS–Q). Department of Psychiatry, The Cambridge
Exner, J. E. (2003). The Rorschach: A comprehensive system. Volume 1: Basic
Hospital and Harvard Medical School, Cambridge, MA: Unpublished
foundations and principles for interpretation (4th ed.). New York: John
manuscript.
Wiley.
Zimmerman, M., & Mattia, J. I. (2001a). The psychiatric diagnostic screen-
Furr, R. M. (2010). The double-entry intraclass correlation as an index of
ing questionnaire: Development, reliability and validity. Comprehen-
profile similarity: Meaning, limitations, and alternatives. Journal of Per-
sive Psychiatry, 42, 175–189. https://doi.org/10.1053/comp.2001.
sonality Assessment, 92(1), 1–15. https://doi.org/10.1080/
23126
00223890903379134
Zimmerman, M., & Mattia, J. I. (2001b). A self-report scale to help make
Hilsenroth, M., Stein, M., & Pinsker, J. (2007). Social Cognition and Object
psychiatric diagnoses: The psychiatric diagnostic screening question-
Relations Scale: Global Rating Method (SCORS-G, 2nd ed.).
naire. Archives of General Psychiatry, 58, 787–794. https://doi.org/10.
Unpublished manuscript, The Derner Institute of Advanced Psychological
1001/archpsyc.58.8.787
Studies, Garden City, NY: Adelphi University.
Hopwood, C. J., & Moser, J. S. (2011). Personality Assessment Inventory
internalizing and externalizing structure in college students: Invariance
580 STEIN ET AL.
How to cite this article: Stein MB, Calderon S, Ruchensky J,

et al. When's a story a story? Determining interpretability of
Social Cognition and Object Relations Scale-Global ratings on
Thematic Apperception Test narratives. Clin Psychol
Psychother. 2020;27:567–580. https://doi.org/10.1002/cpp.
2442

When S A Story A Story Determining Interpretability of Social Cognition and

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

When S A Story A Story Determining Interpretability of Social Cognition and

Uploaded by

Copyright:

Available Formats

Received: 29 December 2019 Revised: 4 February 2020 Accepted: 27 February 2020

When's a story a story? Determining interpretability of Social

Michelle B Stein1 | Solara Calderon2 | Jared Ruchensky1 | Christina Massey1 |

1 | I N T RO D UC TI O N Standards (EIM), Understanding of Social Causality (SC), Experience

set to determine whether there is sufficient information in a narrative

TABLE 2 Study 1 group differences

Demographics Group N M SD T p-value D

TABLE 3 Study 1 group differences: select psychopathology

PAI Group N M SD T p-value D

less meaningful profile. There were significant differences in verbal 3 | S TUD Y 2

Number of default ratings Default percentage Subset

distress. We included Borderline Features (BOR) given that the 14 43.75 9

NEO Five-Factor Inventory

EIM AGG SE ICS

EIM AGG SE ICS

25% cut-off 3.3 | Discussion

How to cite this article: Stein MB, Calderon S, Ruchensky J,

You might also like