Professional Documents
Culture Documents
www.elsevier.com/locate/jcbs
PII: S2212-1447(17)30106-0
DOI: https://doi.org/10.1016/j.jcbs.2017.11.004
Reference: JCBS209
To appear in: Journal of Contextual Behavioral Science
Received date: 30 January 2017
Revised date: 23 October 2017
Accepted date: 8 November 2017
Cite this article as: R. Sonia Singh and William H. O’Brien, A Quantitative
Synthesis of Functional Analytic Psychotherapy Single-Subject Research,
Journal of Contextual Behavioral Science,
https://doi.org/10.1016/j.jcbs.2017.11.004
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Running head: QUANTITATIVE SYNTHESIS OF FAP 1
Corresponding author:
R. Sonia Singh
126 Psychology
Bowling Green State University
Bowling, Green OH 43403
rjsingh@bgsu.edu
(832) 530-9465
Highlights
on the principles of behaviorism (e.g., Kohlenberg & Tsai, 1991; Tsai et al., 2009). FAP is
idiographic in nature, meaning that it often focuses on specific behaviors for individual clients
and proposes that the behaviors clients exhibit in sessions with a therapist are an index of
adaptive and problem behaviors that clients display in natural environments. These in-session
behaviors are referred to as Clinically Relevant Behaviors (CRBs). CRBs are divided into three
QUANTITATIVE SYNTHESIS OF FAP 2
categories: CRB1s are problematic behaviors, CRB2s are adaptive behaviors, and CRB3s are the
client’s descriptions of the topography and function his or her behaviors outside of the session.
Further, the FAP therapist acknowledges behavior outside of the context of therapy. For
example, outside of the therapy session problem behaviors (O1s) and outside of the therapy
CRB2s, and CRB3s. This is accomplished by first carefully operationalizing CBR1s and CRB2s.
To decrease the frequency of CRB1s the therapist uses a combination of procedures such as
and verbal redirection (Tsai et al., 2009). In order to do this effectively, the FAP therapist is
trained to be (a) acutely aware CRB occurrences and (b) consistent with providing in-session
experiences that promote the acquisition, shaping, and maintenance of adaptive changes in
CRB1s and CRB2s. FAP utilizes a system of five rules to guide the therapist: (1) watch for
CRBs; (2) evoke CRBs; (3) reinforce CRB2s; (4) assess therapist impact on client behavior; and
(5) evaluate and generalize (for a detailed description of these five rules see Tsai et al., 2009).
Efficacy of FAP
The empirical evidence regarding the efficacy of FAP is limited (Hayes, Masuda,
Bissett, Luoma, & Guerror, 2005). Mangabeira, Kanter, and Del Prette (2012) conducted a
qualitative review of FAP publications from 1990 to 2010. The authors reported that the majority
of articles written about FAP were conceptual rather than empirical. Further, their analysis of
empirical studies indicated that a majority used single-subject data or were uncontrolled case
QUANTITATIVE SYNTHESIS OF FAP 3
studies. The authors also noted that a number of these single-subject studies did not include
Analysis System of Psychotherapy, and Integrative Behavioral Couple Therapy. The author
noted that FAP did not have any randomized control trials; Therefore, he could not include it in
his analyses. Although this meta-analysis was generally unfavorable of third-wave behavioral
therapies and has received criticism in the field, the noting of the lack of FAP randomized
control trials was understandable and reasonable. Corrigan (2001) and García (2008) also noted
the lack of randomized control trials and criticized FAP for making claims of efficacy without
Since the aforementioned reviews, one small-sample randomized control trial has been
published by Maitland et al. (2016b). In this study, 11 individuals received a FAP intervention
researchers reported that the FAP group, relative to the control group, reported significantly
increased interpersonal functioning (as measured by fear of intimacy) and lower psychological
symptomology. The researchers concluded this was a modest study and that more research is
principles occurring within the context of the therapeutic relationship to promote adaptive in-
session behavior change that is then intended to generalize to outside of session contexts. FAP
uses five rules to decrease problematic behavior and increase adaptive behavior. The published
empirical evidence examining FAP is limited to one randomized control trial and many single-
QUANTITATIVE SYNTHESIS OF FAP 4
subject studies. Additionally, to date, reviews of FAP research have used qualitative methods and
Single-subject research has been extensively used in several fields (e.g., Barlow, Nock, &
Hersen, 2009; Horner et al., 2005; Kazdin, 1982). Proponents of single-subject research suggest
it is one of the initial steps in identifying evidence-based practices (Horner et al., 2005). Further,
when designed appropriately, single-subject research can validly evaluate causal relationships
between interventions and outcomes (Haynes, O’Brien, & Kaholokula, 2011). Another
advantage of single-subject research is that it can be used to examine unique and/or rare
can be quantitatively synthesized (Manalov, Guilera, & Sierra, 2014; Shadish, Hedges, &
Pustejovsky, 2014).
quantitative findings from multiple studies that generate group-based data and inferential
statistical testing. However, single-subject studies typically do not generate group-level data nor
group-level inferential statistical testing; single-subject studies may also include graphs without
graphs using mapping and digitizing technology (e.g., Shadish & Sullivan, 2011). This allows
quantification of published single-subject graphs that do not provide numerical values and
The results section of this paper will provide detailed information about the
to briefly review the unique methodological features of FAP research that influence how these
studies can be quantified, aggregated, and interpreted. First, FAP has been almost exclusively
evaluated using an A/B (baseline/treatment) single-subject design. Second, FAP, like most
in client behavior are less relevant than measures of end-state functioning (i.e., level of
functioning at the conclusion of treatment relative to baseline). Third, FAP data are often
presented in graphs without numerical values for data points. Finally, a majority FAP studies
should: (a) generate and synthesize A/B or pre-post effect sizes, (b) use mapping technologies
that can reliably and accurately generate quantitative information from graphs that do not
provide data values, and (c) use effect sizes that do not require a large number of data points.
Four well-established single-subject effect size indices are well suited for a FAP quantitative
synthesis: Percentage of non-overlapping data (PND: Scruggs & Mastropieri, 1998), Split
Middle Trend Estimation (SMTE: White, 1974), Swanson’s dsw (Swanson, Hoskyn & Lee, 1999)
and the Reliable Change Index (RCI: Jacobson & Truax, 1991). Each of these metrics provide
some unique information that may be important to assess differences in FAP outcomes.
The PND (Scruggs & Mastropieri, 1998) is a common and well-established method for
evaluating single-subject effects. This metric is often considered one of the standard ways to
assess and aggregate single-subject design research. The SMTE (White, 1974) was developed to
address the challenge of outliers and trends in single-subject data that adversely affect PND
calculations. This metric provides unique information because it accounts for trends and outliers
QUANTITATIVE SYNTHESIS OF FAP 6
likelihood of effect.
It should be noted that PND and SMTE do not provide an index of magnitude of effect.
Therefore, researchers developed effect size indices that could be used to supplement the
information provided by the PND and SMTE. Swanson, Hoskyn and Lee (1999) developed dsw in
order to generate an effect size that provides an estimate of treatment outcomes based on end-
state functioning. That is, the effect size is based on the participant’s level of functioning at the
Each of these effect sizes provides different information based on graphed data: (1) PND
offers overall effect and it is often the standard in single-subject design meta-analysis, (2) SMTE
provides a way to assess for outliers and trend, and (3) dsw provides magnitude of effect. In
addition to the aforementioned methods that can be used to quantify graphed data, the Reliable
Change Index (RCI) can be used in single-subject studies when questionnaires are used to
interpersonal relationship between therapist and client. Given that FAP has primarily been
evaluated with single-subject studies and that current reviews of FAP are qualitative, there is a
need to better understand the effectiveness of this therapy using a quantitative approach.
SMTE, Swanson’s dsw, and RCI are well suited for aggregating data from FAP studies. There
were three principal aims of the current study. The first aim was to conduct a methodological
review of FAP single-subject studies. This included reviewing the demographic characteristics of
QUANTITATIVE SYNTHESIS OF FAP 7
participants, length of treatment, and type of FAP therapy provided. The second aim was to
evaluate and synthesize treatment effects using different quantitative indicators of outcomes. A
third aim was to examine the extent to which the effectiveness of FAP varied as a function of
Method
Selection Criteria
Several databases were used to find articles for the current study including PsycINFO
(1872 to present), Psychology and Behavioral Sciences Collection (1930s to present), and ERIC
(1966 to present). The following search terms were used: Functional Analytic Psychotherapy,
FAP treatment, FAP single-subject design, FAP single-case design, and FAP case study. The
search for relevant articles and studies occurred between February 2015 through January 2017.
To be included in this study, articles had to meet the following criteria: (a) the study used
single-subject methods, (b) a FAP based treatment was provided for individual clients, (c) the
study contained data that could be coded (graphs, pre- and post-measures), and (d) the study was
published in a peer reviewed journal or doctoral dissertation. The first author also solicited list-
serves related to FAP to receive unpublished manuscripts that were appropriate for the current
study. Studies that met these inclusion criteria were then reviewed for the quantitative synthesis.
Article Coding
Each qualifying article was examined and coded for the following information: number
psychological wellness (e.g., Beck Depression Inventory), number of treatment sessions, length
representation of treatment. Very few studies provided information on CRB3s, for this reason,
QUANTITATIVE SYNTHESIS OF FAP 8
only CRB1s and CRB2s were coded. Additionally, O1s and O2s were coded and included in the
current study when they were reported instead of CRB1s and CRB2s.
Given that the current study includes a mixture of CRBs and Os, in the following sections
we use the term Target Behavior 1 (TB1) to refer to problems behaviors (CRB1 and O1s) and
Target Behavior 2 (TB2) to refer to adaptive behaviors (CRB2s and O2s). If a participant had
several TB1s or TB2s reported in a single study, the individual TB1s and TB2s were averaged so
that each participant contributed only one TB1 and/or TB2 to the quantitative synthesis across
studies.
The first author and trained assistants independently coded articles using a coding form
that included article information (e.g., authors, year published, title, journal of publication,
affiliation of authors), participant information (e.g., number of participants per study, participant
design, number of baseline and treatment sessions, modality of therapy used), and effect sizes.
Disagreements that occurred between raters were resolved through consensus. The
methodological characteristics of all studies were double coded by first author and the trained
assistant. Inter-rater reliability for overall methodological coding was excellent (κ = .92). Inter-
rater reliability varied from 0.77 to 1.0 for different portions of coding (e.g., title, design,
participants, diagnoses).
A subset of studies (70%) were double coded for PND and SMTE. A high degree of
reliability was found between both PND and SMTE measurements. The average ICC for PND
was 0.98 with a 95% confidence interval that ranged from 0.97 to 0.99 (F(32,47)= 171.75,
p<.001).The average ICC for SMTE was 0.95 with a 95% confidence interval that ranged from
QUANTITATIVE SYNTHESIS OF FAP 9
0.91 to 0.98 (F(23,34) = 39.53, p<.001). RCI and dsw were not double coded because excel
Regarding the assessment of overall methodology, the authors used the Single-Case
This system is a 26-item review checklist that assess the title, abstract, introduction, methods,
results, discussion, and documentation of single-subject design research. Tate et al. (2016)
suggest this method be used for development, replication, and evaluation of single-subject design
researchers.
Using the search terms “Functional Analytic Psychotherapy”, “FAP treatment”, “FAP
single-subject design”, “FAP single-case design”, and “FAP case study,” 179 studies were
initially identified using a review of titles. These 179 studies represented a mixture of narrative
case studies, theoretical articles, literature reviews, and empirical studies. The authors reviewed
the abstracts of all 179 studies to determine eligibility for the current study.
From the abstract review of the 179 studies, the authors excluded 138 because these
articles were conceptual reviews, theoretical articles, or studies about measurement of FAP. The
remaining 41 studies were fully reviewed. Upon full text review, 23 studies were excluded
because they were narrative reviews, did not contain data that could be used to calculate any
effect size, or included data not specifically related to the current study (e.g., therapist training
outcomes, group based outcomes). If data were reported but not in a manner in which the authors
could calculate effect sizes (e.g., mean scores), the authors contacted the authors of the
publications. No additional studies were added using this approach because the authors of the
studies under question did not respond or reported that the original data was unavailable
QUANTITATIVE SYNTHESIS OF FAP 10
(destroyed or was no longer in their possession). This left a total of 18 studies for quantitative
synthesis.
The authors also used the invisible college approach and placed requests on FAP list-
serves and social media pages for articles relevant to this quantitative synthesis. Two additional
studies were collected using this strategy. The authors used an ancestry approach and examined
citations from the FAP articles to identify any studies that might have been missed in the search
strategy. No additional articles were located using this method. Finally, the descendency
approach was used to identify any additional articles that referenced the original FAP book by
Kohlenberg and Tsai (1991). No additional articles were located with these approaches. Thus, 20
qualifying studies were located using all of the aforementioned search and selection strategies.
Graph Digitization and Data Reduction. WebPlotDigitizer was used to generate data
from graphs without raw values. WebPlotDigitizer is a program that can be used to upload
graphs for mapping and obtaining values based on points on the graph. Once a graph was
uploaded, the researchers assigned a 10-point value to the x-axis and y-axis. After this, the
researchers identified each baseline and treatment point in each graph. After identifying each
point, WebPlotDigitizer generated X and Y coordinate values. In order to ensure that values were
appropriately extrapolated from WebPlotDigitizer, the researchers entered the values from the
program into Ploty, a web based graphing program. The researchers then compared the Ploty
graphs with the original graphs to assure that digitization was accurate.
targeted for reduction during treatment (i.e., TB1), the researcher identified the lowest data point
QUANTITATIVE SYNTHESIS OF FAP 11
that occurred during the baseline phase of the study. Next, the researcher determined how many
data points in the treatment phase fell below the lowest baseline point. Finally, the researcher
calculated the PND by dividing the total number of points below the lowest baseline point by the
When a graph presented behaviors targeted for increases during treatment (i.e., TB2), the
researcher identified the highest data point that occurred during the baseline phase of the study.
Next, the researcher determined how many data points in the treatment phase fell above the
highest baseline point. Then, the researcher calculated the PND by dividing the total number of
points above the highest baseline point by the total number of interventions points and
PND values can vary from 0% to 100%; Scruggs and Mastropieri (1998) recommend that
PND scores be classified as follows: PND < 50%: unreliable treatment; PND 50% – 70%:
questionable effectiveness; PND 70 – 90% fairly effective; and PND > 90%: highly effective.
calculated in four steps: (a) the baseline phase was divided into halves, (b) the median point on
the y-axis in each half of the baseline phase was identified, (c) a straight line connecting the
median points of each baseline half was drawn and extended into the treatment phase, and (d) the
number of data points in the treatment phase that fell above or below this line were counted and
divided by the total number of data points in the treatment phase. For example, if the treatment
target was a TB1, the number falling below the line were counted; If the treatment target was a
TB2, the number falling above the line were counted. The proportion of data points that fell
above or below the celeration line were converted into a percentage so that they could be
QUANTITATIVE SYNTHESIS OF FAP 12
compared to the PND. Finally, a binomial calculation was performed to evaluate the probability
A minimum of four baseline points are required to adequately create a celeration line.
Therefore, SMTE was not calculated for any graph with less than four baseline data points.
Given that the null hypothesis for SMTE would be less than 50%, the authors utilized similar
recommendations of PND for SMTE: SMTE < 50%: unreliable treatment; SMTE 50% – 70%:
questionable effectiveness; SMTE 70 – 90% fairly effective; and SMTE > 90%: highly effective.
Swanson’s dsw. Swanson’s dsw was calculated by: (a) forming a baseline mean using the
last three data points during the baseline phase, (b) forming a treatment outcome mean using the
last three data points in the treatment phase, (c) computing a difference score by subtracting the
baseline mean from the treatment mean, and (d) dividing the difference score by the pooled
standard deviation corrected for correlation (the correlation was between the last three baseline
data points and the last three treatment data points). If there were less than three points in the
baseline phase or treatment phase, then only two points in baseline and two points in treatment
were used to calculate this statistic. The formula for calculating Swanson’s dsw is as follows:
As is evident in this formula, higher dsw values indicate larger treatment effects. For
example, a dsw = 1it indicates that the level of functioning at the conclusion of treatment is 1
standard deviation higher than the level of functioning at the conclusion of baseline and when a
dsw = 2, it indicates that the level of functioning at the conclusion of treatment is two standard
deviations higher than the level of functioning at the conclusion of baseline. Swanson and
Sachse-Lee (2000) argued that dsw should be interpreted in the same way that Cohen’s d is
QUANTITATIVE SYNTHESIS OF FAP 13
that effect sizes using d could be classified as “small” (.20), “medium” (.50) and “large” (.80).
Reliable Change Index. The RCI is a standardized score used to assess change in an
individual’s score on a survey measure and uses the participant’s pre- and post-treatment scores,
standard deviation, and reliability coefficients. Given that the studies had small sample sizes, the
standard deviations and reliability coefficients from large sample validation studies were used to
calculate RCIs. The formulas for calculating RCI and the standard error of the difference are
provided below.
In the above formulas, X2 is the post-treatment score, and X1 is pre-treatment score. The
standard error of the difference (Sdiff) is the square root of the standard error of measurement
(SE) squared and multiplied by two. The standard error of measurement is the standard deviation
multiplied by the square root of 1 minus the reliability coefficient for a particular measure.
being zero. Larger RCIs indicate greater treatment effects. When the RCI is greater than +/-1.96
(the 95% confidence interval around the null of zero), then it is labelled “statistically reliable”
because the magnitude of difference is greater than what would be expected to occur by chance
or passage of time. If the RCI was less than +/-1.96, then it is labelled as “not statistically
reliable.” Individual RCI scores were aggregated by calculating an overall mean RCI for all
single-subject studies included in the current study. The RCI was only calculated for studies in
Graph data for each TB1 and TB2 were converted to PND, SMTE, and Swanson’s dsw. A
RCI score was calculated for each self-report inventory for each participant where pre- and post-
treatment data were provided. In studies where multiple TB1, TB2, or questionnaires were
collected on a single participant, an average PND, SMTE, dsw, and RCI were calculated for that
participant. Additionally, when examining studies with multiple phases, only the first baseline
The overall mean PND, SMTE, dsw, and RCI were calculated across studies for each
outcome variable. The mean was used because it was not possible to use the Hedges-
Pustejovsky-Shadish (Shadish et al., 2014) combined effect size calculation which requires three
or more participants per study (only three studies had three or more participants). Follow-up
analyses were then conducted to determine whether there were significant differences in
Results
between 1994 and 2017. The two remaining studies are currently unpublished. These articles
were produced by 43 authors. The articles were published in 9 different journals with most being
reported in the journal of International Journal of Behavioral Consultation and Therapy, Clinical
Case Studies, and The Psychological Record. The methodological characteristics of studies are
Participant Characteristics. There were a total of 37 participants across all studies (19
males, 15 females, and 3 participants whose gender was not reported). The average number of
participants per article was 2 (Range 1 – 5). However, it should be noted that two studies utilized
QUANTITATIVE SYNTHESIS OF FAP 15
data from the same participant resulting in the final count of 36 participants (Busch et al., 2009;
Kanter et al., 2006). This participant was a 24/25-year-old African American female and her data
were combined in reporting participant characteristics and effect size calculation for TB1.
Participants varied in age from 7 to 72 (M = 28.69, SD = 14.3) and the age of three participants
was unknown. Information regarding ethnicity was not provided for 21 participants. For the 14
participants whose ethnicities were reported, 10 were Caucasian, two were biracial, one was
Diagnoses were provided for 18 participants. The most common diagnoses were: mood
disorders (n = 6), personality disorders (n = 3), co-morbid mood and personality disorders (n =
3), co-morbid mood, personality and substance disorders (n = 2), co-morbid mood, posttraumatic
stress disorder, and substance use disorder (n = 1), co-morbid mood, anxiety, and personality (n
= 1), co-morbid mood and psychotic disorder (n = 1), and co-morbid personality and psychotic
disorder (n = 1).
Targets of Treatment. A variety of TB1s and TB2s were targeted for treatment (see
Table 2). For review in the current study, TB1s and TB2s were categorized based on subscales of
in Table 2, the most common TB1s were: problematic disclosure, problematic emotional
expression, and conflict. The most common TB2s were: Effective disclosure, adaptive emotional
during baseline was 6 (Range 2 – 12) and the average number of observations during treatment
was 11 (Range 4 – 25). Most studies provided FAP-Alone (n = 14). In other instances, FAP was
Behavioral Activation (n = 2), Cognitive-Therapy (n = 1), and Child Behavior Analytic Therapy
(n = 1). Based on this information, two categories of studies were formed: (a) “FAP-Alone”
treatment and (b) “FAP-Enhanced” treatment. Seven studies utilized case study approaches, 10
utilized A/B or A/A+B design, two used a multiple baseline design, and one used a reversal
design.
The SCRIBE method indicated that none of the studies met the full 26 criteria of the
SCRIBE methodology. The range of studies based on the SCRIBE method score was 2 to 17.
The mean score was 11.95. There was variability in scoring given that some studies were case
studies, A/B designs, and more sophisticated single-subject designs. The overall grade for each
Table 3 and 4 provide a summary of PND, SMTE, and Swanson’s dsw effect sizes for TB
1s and TB 2s. Table 5 provides a summary of RCI scores. All effect size metrics did not show
evidence of significant skew or kurtosis utilizing both standard error of skewness and standard
error of kurtosis for significance testing. Therefore, it was appropriate to average and conduct
statistical testing on measures when relevant and applicable. The mean PND for all TB1s and
TB2s were respectively 58.70% (n = 21, SD = 40.76, 95% CI = 41.31 – 76.07) and 79.39% (n =
24, SD = 31.71, 95% CI = 67.67 – 92.51). Using Scruggs and Mastroprieri’s (1998)
classification, the mean PND for TB1s was classified as “questionably effective” with the 95%
confidence interval ranging from “ineffective” to “fairly effective”. The mean PND for TB2s
was be classified as “fairly effective” with a 95% confidence interval ranging from “questionably
The overall mean SMTE for TB1s and TB2s were respectively 69.43% (n = 16, SD =
36.28, 95% CI = 50.70 – 88.17) and 80.66% (n = 18, SD = 30.29, 95% CI = 69.27 – 96.85). The
mean SMTE for TB1s fell into the upper range of the “questionably effective” classification with
the 95% confidence interval ranging from “questionably effective” to the upper end of “fairly
effective.” Of the 15 SMTE analyses that could be conducted for TB1s for all participants, 7
(47%) were significant. The mean SMTE for TB2s fell into the “fairly effective” classification
with a 95% confidence interval ranging from the upper end of “questionably effective” to
“highly effective.” Out of the 18 SMTE analyses that could be conducted for TB2s, 15 (83%)
The overall mean Swanson’s dsw for TB1s and TB2s were respectively 1.33 (n = 21, SD =
0.87, 95% CI = 0.95 – 1.71) and 1.85 (n = 24, SD = 0.97, 95% CI = 1.49 – 2.25). Of the 21
Swanson’s dsw for TB1s, 71% (n = 15) were large classification, 10% (n = 2) were medium, and
19% (n = 4) were small. Of the 24 Swanson’s dsw for TB2s, 83% (n = 20) were large, 4% (n = 1)
was medium, and 13% (n = 3) were small. Taken together, these results indicated that both TB1s
and TB2s reliably decreased from pre-treatment to post-treatment (none of the 95% confidence
intervals contained zero) and that for a majority of studies, the effect sizes were large. Finally,
the average effect size for TB2s was higher than the average effect size for TB1s.
RCI scores were divided into symptom-based RCI scores and quality of life-based RCI
scores. The symptom-based RCIs are analogous to TB1s and were expected to show a decrease
with FAP. Self-report survey data from seven participants reported in seven studies were used to
calculate the average symptom-based RCI. The quality of life-based RCIs are analogous to TB2s
and were expected to increase with FAP. Self-report survey data from ten participants reported in
eight studies were used to calculate the overall quality of life-based RCI. The means for
QUANTITATIVE SYNTHESIS OF FAP 18
symptom-based RCIs and quality of life-based RCIs were respectively 5.36 (n = 7, SD = 3.57,
95% CI = 3.09 – 7.61) and 2.93 (n = 10, SD = 3.16, 95% CI = 1.36 – 4.51). This indicated that
that both sets of RCIs were large and positive. Further, these RCIs were statistically reliable
given that the 95% confidence intervals did not include zero.
Given the variation in the metrics across studies, analyses were conducted to evaluate the
extent to which outcomes differed as a function of gender, ethnicity, and age. There were no
significant relationships observed between any of these demographic characteristics and any
sessions and whether FAP-Alone outcomes differed from FAP-Enhanced outcomes. Results
indicated that number of sessions was not significantly associated with any outcome measure
using PND, SMTE, dsw, or RCI. In order to compare FAP-Alone and FAP-Enhanced outcomes,
independent t-tests were conducted using the PNDs, SMTEs, and dsw as dependent variables.
Results indicated that the mean FAP-Alone PND for TB1s (M = 71.98, SD = 37.68) was
significantly higher (t (20) = 2.62, p = <0.05) than the mean FAP-Enhanced PND (M = 30.24,
SD = 33.42). Cohen’s d = 1.17 indicating large effect. All other t-tests were non-significant.
Failsafe Calculations. Noting that journals are biased toward publishing significant
findings, Rosenthal (1979) developed what has been termed a “failsafe number.” Rosenthal’s
failsafe number is the number of non-significant studies (or hypothesis tests) stored in file
drawers that would be needed to raise the overall p-value in a conventional meta-analysis that
aggregates group-level data to > .05. Because the current study is aggregating single-subject
QUANTITATIVE SYNTHESIS OF FAP 19
data, Rosenthal’s failsafe calculation cannot be used. However, Orwin (1983) and Wolf (1986)
In this formula, Nfs is the failsafe number, No is the number of observed effect sizes, do is
the average d observed across studies, and dc is the criterion. Orwin (1983) and Wolf (1986)
recommend using Cohen’s (1988) effect size classification scheme for small or medium effects
(small d = .2, medium d = .5) to set the value of dc. Their argument is that using criterion effect
of as “clinically non-significant.”
The average Swanson’s dsw obtained in this study is analogous do in Orwin (1983) and
Wolf’s (1986) failsafe equations. As such, their equation can be used as a heuristic technique to
estimate failsafe numbers for the current synthesis of single-subject data. For TB, the average dsw
was 1.33 based on 21 effect sizes. Thus, the failsafe number compared against hypothetical small
For TB2s, the average dsw was 1.85 based on 24 effect sizes. Thus, the failsafe number compared
against hypothetical small and medium file drawer effect sizes are:
These failsafe numbers are addressing the following question: “How many unpublished
post-treatment are needed to reduce the overall Swanson’s dsw to the small (.2) or medium (.5)
QUANTITATIVE SYNTHESIS OF FAP 20
classification level?” As is evident in these calculations, the failsafe findings suggest that the
FAP outcomes found in this quantitative synthesis are quite robust when contrasted against the .2
and .5 criteria.
Discussion
The current study assessed the methodology of the FAP outcome studies by examining
and reporting the methods of all studies included. In order to better understand the effects of
FAP, overall effect sizes were calculated using PND, SMTE, dsw, and RCIs. Variation in effect
The current review located 18 published FAP studies with outcome data and two
unpublished studies with outcome data for a total of 20 studies reviewed. A majority of the
studies used an A/B design and one used a reversal design. Participants varied in age, ranging
participants were Caucasian. However, 13 out of the 20 studies did not report ethnicity. Few
The SCRIBE method (Tate et al., 2016) was used to evaluate the methodological rigor of
FAP studies None of the studies met all 26 criteria designed by Tate et al. (2016) to assess the
methodological rigor of single-subject studies. This indicated that the FAP literature can be
improved with more rigorous design and treatment outcome evaluation methods. A majority of
the studies provided basic information related to the SCRIBE criteria (e.g., background
information, aims, study design, and description of intervention). However, some of the more
sophisticated criteria were not met by the studies in the current review (e.g., statement of adverse
events, availability of study protocol, or explicit statement of whether or not any funding sources
In summary, the FAP literature has some positive methodological features. First, there is
variation in targets of treatment with measures ranging from depression, anxiety, personality
disorders, and several other interpersonal issues. Second, there is good variation in age, gender,
and ethnicity. Finally, articles came from a diversity of researchers from several different
Despite the above mentioned methodological strengths, most FAP outcome studies are
limited by the use of designs which provide only weak causal inference, uncertain construct
validity, and questionable generalizability. First, the extensive use of A/B designs provide weak
evidence of causality. Several other well-known causal inference threats could produce A/B
(e.g., non-contingent but affirming therapist responding during sessions) hampers the construct
validity of FAP studies because one cannot infer that it is FAP techniques per se that are
responsible for A/B changes. Finally, the near exclusive use of single-subject investigations
Based on our quantitative analysis of outcomes, there is evidence that there were reliable
varied as a function of outcome metric used, target of treatment, and treatment type. For PND
and SMTE analyses the overall mean effect sizes fell into the “questionably effective” to “fairly
effective” classification. Alternatively, the Swanson’s dsw analyses indicated that pre-treatment
to post-treatment differences were consistently large and reliable. Similarly, the mean RCIs were
also large and consistently classified as “statistically reliable.” In terms of treatment targets, the
pre-treatment to post-treatment differences tended to be larger for TB2s relative to TB1s. Finally,
QUANTITATIVE SYNTHESIS OF FAP 22
outcomes from FAP-Alone interventions were more favorable than FAP-Enhanced interventions
for TB1s.
address three important questions about the FAP literature. These questions are: (a) is there
differences; (b) are the pre-treatment to post-treatment differences greater than what would have
been expected to have occurred by chance or the passage of time; and (c) can the pre-treatment
Regarding the first question, the results of this quantitative synthesis indicate that across
studies, metrics, and targets of treatment, there is evidence that TB 1s reliably declined from pre-
treatment. The clinical significance of treatment effects varied as a function of the metric used to
quantify outcomes. Specifically, PND and SMTE analyses yielded more conservative estimates
The differences between PND, SMTE, and dsw can be attributed an important difference
in how these metrics are calculated. The PND and SMTE are derived from comparisons between
baseline and measures collected in the early, middle, and end points of therapy whereas dsw is
derived from comparisons between baseline and the end-of-treatment measurement. As such, the
PND and SMTE include data points that were collected before the intervention was completed.
This would inevitably reduce estimates of effectiveness given that behavior change is expected
PND and SMTE have been used many single-subject meta-analytic reviews. The
popularity of these metrics likely arises from their ease of calculation (simply counting data
QUANTITATIVE SYNTHESIS OF FAP 23
points in graphs that fall above and below some reference) and need for visual inspection only.
Alternatively, Swanson’s dsw requires that the study author provide numerical data for each point
on a graph (which very rarely occurs) or the use digitizing mapping technology in order to
Given that PND and SMTE are derived from less relevant data in the FAP literature and
that mapping and digitizing data are now more readily available, we recommend that future
single subject quantitative use dsw metrics for evaluating treatment outcomes where (a) end-state
functioning is the focus of treatment and (b) behavior change is expected to occur gradually
across time.
For the second question, some of the results of this review indicate that the pretreatment
to post treatment differences are greater than what would have occurred due to the passage of
time. This position is primarily based on RCI, dsw, and SMTE data. Specifically, the RCI
calculation takes into account the standard error of measurement which is an index of the amount
of change in a score that would be expected to occur by chance and/or with repeated
administration across time. Most of the RCIs in our analyses exceeded the 95% confidence
interval by a large margin. The dsw findings also support this conclusion. Correcting the
serial dependency and trends in the dsw calculation. This is important because serial dependency
and trends would be the principal mechanism through which non-treatment related factors (e.g.,
TB 2s. Finally, the SMTEs corrected for pre-treatment trends. Taken together, the RCI, dsw, and
SMTE outcomes support an argument that the pre-treatment to post-treatment changes were not
attributed to FAP. As noted earlier, the absence of placebo comparison conditions (e.g., an
assignment to a placebo control group using a group design), makes it impossible to attribute the
pre-treatment to post-treatment differences to FAP. Several other factors may have promoted
changes in participant behavior (e.g., therapist attention and empathy) that were not explicitly
part of the FAP intervention. Finally, a number of the classic causal inference threats could
account for some of the pre-post differences. The more salient of these threats would be: history,
In examination of the potential moderators, one interesting finding was that FAP-Alone
outperformed FAP-Enhanced interventions based on the PND effect sizes for TB1s. This finding
is logical if one considers the nature of FAP sessions and therapist-client interactions.
Specifically, in FAP-Alone, the therapist will engage in actions that are systematically,
interventions, the therapist was providing more didactic material and possibly focused on other
behaviors that were not directly related to interpersonal interactions. Thus, in any given session,
there would be fewer opportunities for the client to emit target behaviors. Similarly, the therapist
would have fewer opportunities to provide systematic consequences for the targeted behaviors.
Another consistent finding was that there were larger effects observed for TB2s relative
to TB1s. This finding is congruent with FAP principles and learning theory. Specifically, FAP
emphasizes the use of reinforcement to promote acquisition of adaptive TB2s in session. While
extinction and punishment can be used to suppress TB1s, these techniques are not preferred
because they may have an adverse impact on the therapeutic relationship. Instead, the therapist
QUANTITATIVE SYNTHESIS OF FAP 25
aims to increase TB2s with the notion that this increase in adaptive behavior will simultaneously
be plausible to argue that the direct and more frequent reinforcement of TB2s would yield a
larger treatment effect. Further, it may be that it is more challenging for therapists to address
TB1s (e.g., re-direction, selective ignoring, blocking) than it is to reinforce TB2s. Previous
research has shown component-process analysis of reinforcing TB2s (Haworth et al., 2015), and
further research may benefit from exploring this process for blocking TB1s.
Limitations
One major consideration of the current findings is the “file drawer” problem. The failsafe
question for this literature is: “How many times have FAP researchers initiated a single-subject
treatment study but failed to report or publish the result because the client did not respond to
treatment?” The failsafe analyses in this paper indicate that a substantial number of single-
subject studies with nonresponsive clients would be needed to reduce the average Swanson’s dsw
to small or medium effect size classifications. However, given that there is ample evidence of a
research, it is reasonable to argue that there are at least some studies with nonresponsive clients
in the file drawers of FAP researchers. Adding data from these studies would reduce the
magnitude and reliability of the overall effect sizes reported in this paper. Thus, it is likely that
the findings reported in this quantitative synthesis overestimate the effectiveness of FAP to some
Conclusion
The current study is a quantitative analysis of the existing FAP single-subject design
treatment literature. It provides an estimate of the efficacy of FAP based on the currently
QUANTITATIVE SYNTHESIS OF FAP 26
available single-subject studies. These results indicate that FAP may be associated with reliable
treatment effects for a variety of behaviors based on pre-post comparisons. As such, FAP may be
study are due solely to FAP due to the limitations of the research designs used. It is also difficult
to assess how many unpublished, failed trials exist that may nullify the results presented in this
paper.
There remains a clear need for more systematic and methodologically rigorous FAP
research. Most importantly, the authors recommend that researchers conduct more randomized
control trials so that stronger causal statements can be made about FAP effectiveness. Although
the turn-by-turn coding of the FAP rating system (FAPRS: Callaghan & Follette, 2008) may be
cumbersome for randomized control trials, the recent development of treatment adherence
measures (e.g., Maitland et al., 2016a; Maitland et al., 2016b) and self-report measures targeting
FAP-specific constructs (e.g., Darrow, Callaghan, Bonow, & Follette, 2014; Leonard et al.,
If FAP researchers continue to utilize single-subject design studies, then it may also be
beneficial to use guidelines established for strong methodological rigor (e.g., Tate et al., 2016).
Moreover, the use of multiple baseline designs, A/B/C designs, or reversal/withdrawal designs
could more fully assess the causal effects of FAP. In addition to stronger methodology, the
authors recommend that researchers place a greater emphasis on collecting and reporting
participant demographic characteristics in order to more clearly understand the efficacy of FAP
and the different populations with which it may be effective. Finally, the use of placebo
QUANTITATIVE SYNTHESIS OF FAP 27
comparisons conditions would permit a better evaluation of the specific effects of FAP
References
Barkham, M., Margison, F., Leach, C., Lucock, M., Mellor-Clark, J., Evans, C., . . . McGrath, G.
(2001). Service profiling and outcomes benchmarking using the CORE-OM: Toward
Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies
for studying behavior change (3rd ed.). Boston, MA: Allyn and Bacon.
515.
Baruch, D. E., Kanter, J. W., Busch, A. B., & Juskiewicz, K. (2009). Enhancing the therapy
Bond, F. W., Hayes, S. C., Baer, R. A., Carpenter, K. M., Guenole, N., Orcutt, H. K., ... & Zettle,
Busch, A. M., Kanter, J. W., Callaghan, G. M., Baruch, D. E., Weeks, C. E., & Berlin, K. S.
Callaghan, G. M. (2006). The Functional Idiographic Assessment Template (FIAT) System: For
357-398. doi:10.1037/h0100160
Callaghan, G. M., & Follette, W. C. (2008). Coding Manual for the Functional Analytic
doi:10.1037/h0100649
Callaghan, G. M., Summers, C. J., & Weidman, M. (2003). The treatment of histrionic and
Cattivelli, R., Tirelli, V., Berardo, F., & Perini, S. (2012). Promoting appropriate behavior in
doi:10.1037/h0100933
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hilsdale. NJ: Lawrence
Earlbaum Associates, 2.
Corrigan, P. W. (2001). Getting ahead of the data: A threat to some behavior therapies. The
Darrow, S. M., Callaghan, G. M., Bonow, J. T., & Follette, W. C. (2014). The Functional
Haynes, S. N., O’Brien, W. H., & Kaholokula, J. K. (2011). Behavioral Assessment and Case
Hayes, S. C., Masuda, A., Bissett, R., Luoma, J., & Guerror, L. F. (2005). DBT, FAP, and ACT:
how empirically oriented are the new behavior therapy technologies? Journal of Behavior
Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S. L., & Wolery, M. (2005). The use of
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining
Psychology, 59,12-19.
Kanter, J. W., Rusch, L. C., Busch, A. M., & Sedivy, S. K. (2009). Validation of the Behavioral
36-42.
Kanter, J., Landes, S., Busch, A., Rusch, L., Brown, K., Baruch, D., & Holman, G. (2006). The
Kanter, J. W., Parker, C. & Kohlenberg, R. J. (2001). Finding the self: A behavioral measure and
its clinical implications. Psychotherapy: Theory, Research and Practice, 38, 198-211.
QUANTITATIVE SYNTHESIS OF FAP 31
Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings.
Kohlenberg, R. J., & Tsai, M. (1991). Functional Analytic Psychotherapy: A guide for creating
Kohlenberg, R., & Tsai, M. (1994). Improving cognitive therapy for depression with functional
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief
doi:10.1046/j.1525-1497.2001.016009606.x
Lambert, M. J., Burlingame, G. M., Umphress, V., Hansen, N. B., Vermeersch, D. A., Clouse, G.
C., & Yanchar, S. C. (1996). The reliability and validity of the outcome questionnaire.
Landes, S. J., Kanter, J. W., Weeks, C. E., & Busch, A. M. (2013). The impact of the active
Leonard, R. C., Knott, L. E., Lee, E. B., Singh, S., Smith, A. H., Kanter, J., … Wetterneck, C.
T. (2014). The development of the functional analytic psychotherapy intimacy scale. The
Lizarazo, N. E., Muñoz-Martínez, A. M., Santos, M. M., & Kanter, J. W. (2015). A within-
doi.org/10.1007/s40732-015-0122-7
QUANTITATIVE SYNTHESIS OF FAP 32
Maitland, D. W. M., Kanter, J. W., Tsai, M., Kuczynski, A. M., Manbeck, K. E., & Kohlenberg,
637. doi.org/10.1007/s40732-016-0198-8
Maitland, D. W. M., Petts, R. A., Knott, L. E., Briggs, C. A., Moore, J. A., & Gaynor, S. T.
watchful waiting: Enhancing social connectedness and reducing anxiety and avoidance.
Manalov, R., Guilera, G., & Sierra, V. (2014). Weighting strategies in the meta-analysis of
Manduchi, K., & Schoendorff, B. (2012). First steps in FAP: Experiences of beginning
72-77. doi:10.1037/h0100940
Mangabeira, V., Kanter, J., & Del Prette, G. (2012). Functional analytic psychotherapy (FAP): A
Manos, R. C., Kanter, J. W., Rusch, L. C., Turner, L. B., Roberts, N. A., & Busch, A. M. (2009).
Integrating functional analytic psychotherapy and behavioral activation for the treatment
QUANTITATIVE SYNTHESIS OF FAP 33
doi:10.1177/1534650109332484
McCarthy-Larzelere, M., Diefenbach, G. J., Williamson, D. A., Netemeyer, R. G., Bentz, B. G.,
doi:10.1177/10731911010080020
doi:10.1037/h0100942
Meyer, T. J., Miller, M. L., Metzger, R. L., & Borkovec, T. D. (1990). Development and
validation of the Penn state worry questionnaire. Behaviour Research and Therapy,
Mundt, J. C., Marks, I. M., Shear, M. K., & Greist, J. M. (2002). The Work and Social
Oshiro, C. K. B., Kanter, J., & Meyer, S. B. (2012). A single-case experimental demonstration of
functional analytic psychotherapy with two clients with severe interpersonal problems.
doi:10.1037/h0100945
QUANTITATIVE SYNTHESIS OF FAP 34
Öst, L. G. (2008). Efficacy of the third wave of behavioral therapies: a systematic review and
doi:10.1016/j.brat.2007.12.005
Pedersen, E. R., Callaghan, G. M., Prins, A., Nguyen, H. V., & Tsai, M. (2012). Functional
stress disorder: Theory and application in a single case design. International Journal of
Rosenthal, R. (1979). The “file drawer problem” and tolerance of null results. Psychological
Scruggs, T. E., & Mastropieri, M. A. (1998). Summarizing single-subject research: Issues and
Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of
Shadish, W. R., & Sullivan, K.J. (2011). Characteristics of single-case designs used to assess
Singh, S., & O’Brien, W. H. (2016). Functional analytic psychotherapy for nursing home
Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the quality of
marriage and similar dyads. Journal of Marriage and the Family, 15-28.
QUANTITATIVE SYNTHESIS OF FAP 35
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for
166(10), 1092.
Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1999). Dimensions of the beck depression
117-128. doi:10.1002/(SICI)1097-4679(199901)55:1
Swanson, H. L., Hoskyn, M., & Lee, C. (1999). Interventions for students with learning
Tate, R. L., Perdices, M., Rosenkoetter, U., McDonald, S., Togher, L., Shadish, W., ... &
9.
Tsai, M., Kohlenberg, R. J., Kanter, J. W., Kohlenberg, B., Follette, W. C., & Callaghan, G. M.
Villas-Bôas, A., Meyer, S. B., & Kanter, J. W. (2016). The effects of analyses of contingencies
0195-y
Virella, B., Arbona, C., & Novy, D. M. (1994). Psychometric properties and factor structure of
White, O. R. (1974). The “split middle” a “quickie” method of trend estimation. University of
Center.
Wolf, F. M. (1986). Meta-analysis: Quantitative methods for research synthesis. London: Sage.
Xavier, R. N., Kanter, J. W., & Meyer, S. B. (2012). Transitional probability analysis of two
Table 1
Brief description of studies used in quantitative synthesis
Participant
Total SCRIBE
Author Description Study Design Treatment Baseline Treatment
Sessions Score
(Age)
Baruch, Kanter,
Busch, & 1 Male (21) Case Study FAP-Enhanced ACT 37 9
Juskiewics (2009)
Callaghan,
Summers, & 1 Male (30) A/B Design FAP 23 13
Weidman (2003)
1 Unknown 2 6 8
Cattivelli Multiple
2 Unknown FAP 2 6 8 2
(Unpublished) Baseline
3 Unknown 2 8 10
1 Male (12) 4 11 15
Cattivelli, Tirelli, 2 Male (11) 7 9 16
Multiple
Berardo, & Perini 3 Male (12) FAP 7 8 15 14
Baseline
(2012) 4 Male (13) 6 14 20
5 Male (15) 8 18 26
Ferro-Garcia,
Lopez-Bermudez,
1 Female (24) Case Study FAP 7 16 23 9
& Valero-Aguayo
(2012)
Kohlenberg, &
1 Male (35) A/B Design FAP-Enhanced CT 8 7 15 12
Tsai (1994)
6 4 10
1 Female (44)
Landes, Kanter, 6 7 13
2 Female (20)
Weeks, & Busch A/A+B Design FAP 10 4 14 17
3 Male (28)
(2013) 7 7 14
4 Male (26)
4 10 14
Lizarazo, Muñoz- 1 Male (25)
5 14 19
Martínez, Santos, 2 Female (47) A/A+B Design FAP 18
6 10 16
& Kanter (2015) 3 Female (21)
Manduchi, &
1 Female (36) Case Study FAP 52 10
Schoendorff (2012)
Manos et al.,
1 Female (22) Case Study FAP-Enhanced BA 8 13
(2009)
McCluskey 2
1 Male (25) A/B Design FAP-Enhanced BA 10 10 3
(Unpublished) 0
2
Oshiro, Kanter, & 1 Female (46) Reversal 11 9 0
FAP 16
Meyer (2012) 2 Male (18) Design 12 8 2
0
Pedersen,
Callaghan, Prins,
1 Female (41) Case Study FAP 14
Nguyen, & Tsai
(2012)
1 Male (72) 2 4 6
Singh & O'Brien
2 Male (52) A/B Design FAP 2 4 6 15
(2016)
3 Female (31) 2 4 6
3
Villas-Bôas,
1 Female (38) A/B/BC/B2/ 5 33 8
Meyer, & Kanter FAP 14
2 Female (32) BC2 5 28 3
(2016)
3
1
Xavier, Kanter, & 1 Female (10) FAP-Enhanced 7
Case Study 11
Meyer (2012) 1 Male (7) Child Therapy 3
1
QUANTITATIVE SYNTHESIS OF FAP 39
Table 2
Description of Diagnoses, TB1s, and TB2s per study
Authors P DSM Disorder CBR1 Dimension CRB2 Dimension
Disclosure Disclosure
Dysthymia
Baruch et al. (2009) 1 Emotional Emotional
Psychotic Symptoms
Expression Expression
Disclosure
Major Depressive Disorder
Busch et al. (2009) 1 Conflict Emotional
Histrionic Personality Disorder
Expression
Assertion of Needs Assertion of Needs
Bidirectional Bidirectional
Narcissistic Personality Disorder Communication Communication
Callaghan, et al. (2003) 1
Histrionic Personality Disorder Disclosure Disclosure
Emotional Emotional
Expression Expression
Cattivelli et al. (2012) 1 No description
Disclosure
Ferro-Garcia et al. (2012) 1 Major Depressive Disorder Disclosure Emotional
Expression
Major Depressive Disorder Conflict
Kanter et al. (2006) 1 Disclosure
Histrionic Personality Disorder Disclosure
Major Depressive Disorder Bidirectional
Bidirectional
2 Personality Disorder, NOS Communication
Communication
Past polysubstance dependence Conflict
Kohlenberg & Tsai
1 Depression No description
(1994)
Major Depressive Disorder
Landes et al. (2013) 1 Generalized Anxiety Disorder Assertion of Needs Assertion of Needs
Depressive Personality Disorder
Major Depressive Disorder
Avoidant Personality Disorder Bidirectional Bidirectional
2
Obsessive Compulsive Personality Disorder Communication Communication
Depressive Personality Disorder
Major Depressive Disorder
Past alcohol abuse Emotional Emotional
3
Avoidant, Depressive, and Borderline Expression Expression
Personality Disorder
Emotional Emotional
Major Depressive Disorder
4 Expression Expression
Depressive Personality Disorder
Assertion of Needs Assertion of Needs
Lizarazo et al. (2015) 1 Borderline Personality Disorder Disclosure Disclosure
Bidirectional Bidirectional
2
Communication Communication
3 Disclosure Disclosure
Bidirectional
Lopez (2002) 1 Communication Disclosure
Conflict
Bidirectional
Communication Disclosure
Manduchi & Schoendorff Obsessive Compulsive Personality Disorder
1 Disclosure Emotional
(2012) Borderline Personality Disorder
Emotional Expression
Expression
Conflict
Disclosure
Disclosure
Manos et al. (2009) 1 Emotional
Emotional
Expression
Expression
Disclosure Disclosure
McClafferty (2012) 2 Depression Emotional Emotional
Expression Expression
QUANTITATIVE SYNTHESIS OF FAP 40
Emotional Emotional
3 Major Depressive Disorder
Expression Expression
Villas-Bôas et al. (2016) 1 Conflict Conflict
Table 3
Effect Size Calculation per Participant for Graphical Data for Target Behaviors-1
Participant Baseline Treatment
Swanson’
Author Description Points for Points for PND SMTE
sd
(Age) Calculation Calculation
1 Unknown 2 6 1.63 83.33%
Cattivelli
(Unpublished) 2 Unknown 2 6 1.70 100%
3 Unknown 4 6 1.56 100% 100%*
Table 4
Effect Size Calculation per Participant for Graphical Data for Target Behaviors-2
Participant Baseline Treatment
Author Description Points for Points for Swanson’s d PND SMTE
(Age) Calculation Calculation
1 Gender/Age
2 6 1.71 100%
Unknown
Cattivelli 2 Gender/Age
2 6 1.73 100%
(Unpublished) Unknown
3 Gender/Age
4 6 0.08 100%
Unknown
1 Male (12) 4 11 1.70 100% 100%*
2 Male (11) 7 9 2.06 100% 100%*
Cattivelli et al.,
3 Male (12) 7 8 1.95 100% 100%*
(2012)
4 Male (13) 6 14 2.43 100% 100%*
5 Male (15) 8 18 3.33 100% 100%*
Table 5
Reliable Change Scores
RCI
Author P Measure
Score
Symptom-Based Measures
Baruch et al., (2009) 1 Beck Depression Inventory-II 2.33*
Busch et al., (2009) 1 Beck Depression Inventory-II 3.96*
Callaghan et al., (2003) 1 Beck Depression Inventory-II 0.93
State-Trait Anxiety Inventory 5.62*
Lopez (2002) 1
Penn State Worry Questionnaire 6.21*
Beck Depression Inventory-II 2.57*
Manduchi & Schoendorff (2012) 1
Worry Domains Questionnaire 4.39*
Patient Health Questionnaire-9 11.00*
McClafferty (2012) 1
Generalized Anxiety Disorder-7 11.88*
McCluskey (Unpublished) 1 Beck Depression Inventory-II 4.66*
Total Participants 7
Mean 5.36*
Confidence Interval (3.09 – 7.62)
Quality of Life Measures
Baruch et al., (2009) 1 Outcome Questionnaire – 45 2.58*
Experience of Self Scale 6.00*
Ferror-Garcia et al., (2012) 1
Acceptance & Action Questionnaire-Spanish 1.98*
Lopez (2002) 1 Acceptance & Action Questionnaire-II 5.13*
Manduchi & Schoendorff (2012) 1 Acceptance & Action Questionnaire-II 3.84*
Behavioral Activation for Depression Scale -0.60
Manos et al., (2009) 1
Dyadic Adjustment Scale -0.37
Work and Social Adjustment Scale 7.16*
McClafferty (2012) 1
CORE Outcome Measure 9.01*
Functional Ideographic Assessment Template – Short Form 2.79*
1
Acceptance & Action Questionnaire-II 1.54
Functional Ideographic Assessment Template – Short Form -2.79*
Singh & O’Brien (2016) 2
Acceptance & Action Questionnaire-II -1.02
Functional Ideographic Assessment Template – Short Form 5.09*
3
Acceptance & Action Questionnaire-II 3.33*
McCluskey (Unpublished) 1 Acceptance & Action Questionnaire-II 3.33*
Total Participants 10
Mean 2.94*
Confidence Interval (1.35 – 4.51)
Note. * denotes statistically reliable RCI score.
QUANTITATIVE SYNTHESIS OF FAP 44
Table 6
Reliability and Standard Deviations Used for Reliable Change Index Calculation
Measure Citation Cronbach’s SD
BDI-II Steer, Ball, Ranieri, & Beck (1999) 0.93 11.46
STAI-State Virella, Arbona, & Novy (1994) 0.91 9.46
STAI-Trait Virella, Arbona, & Novy (1994) 0.86 8.88
PSWQ Meyer, Miller, Metzger, & Borkovec (1990). 0.97 13.80
Worry Domains Questionnaire McCarthy-Larzelere et al., (2001) 0.94 19.52
PHQ-9 Kroenke, Spitzer, & Williams (2001) 0.89 6.10
GAD-7 Spitzer, Kroenke, Williams, & Löwe (2006). 0.89 3.41
Outcome Questionnaire – 45 Lambert et al., (1996) 0.93 24.14
Experience of Self Scale Kanter, Parker, & Kohlenberg (2001) 0.91 1.33
AAQ-Spanish Barraca (2004) 0.74 8.42
AAQ–II Bond et al., (2011) 0.88 7.97
BADS Kanter, Rusch, Busch, Sedivy (2009) 0.92 20.15
Dyadic Adjustment Scale Spainer (1976) 0.96 28.30
Work and Social Adjustment Scale Mundt, Marks, Shear, & Greist (2002). 0.80 6.40
CORE Outcome Measure Barkham et al., (2001) 0.94 0.75
FIAT-Q-SF Darrow, Callaghan, Bonow, & Follete (2014) 0.85 18.31
Note. BDI-II: Beck Depression Inventory-II
STAI: State-Trait Anxiety Inventory
PSWQ: Penn State Worry Questionnaire
PHQ-9: Patient Health Questionnaire-9
GAD-7: Generalized Anxiety Disorder Assessment-7
AAQ: Acceptance & Action Questionnaire
FIAT-Q-SF: Functional Ideographic Assessment Template-Questionnaire-Short Form
QUANTITATIVE SYNTHESIS OF FAP 45
Ineligible
Conceptual Reviews (n = 75)
Theoretical articles (n = 56)
Measurement articles (n = 7)
Studies included in
quantitative synthesis
(n = 20)
Figure 1.
Flow Chart of Study Selection