You are on page 1of 46

Author’s Accepted Manuscript

A Quantitative Synthesis of Functional Analytic


Psychotherapy Single-Subject Research

R. Sonia Singh, William H. O’Brien

www.elsevier.com/locate/jcbs

PII: S2212-1447(17)30106-0
DOI: https://doi.org/10.1016/j.jcbs.2017.11.004
Reference: JCBS209
To appear in: Journal of Contextual Behavioral Science
Received date: 30 January 2017
Revised date: 23 October 2017
Accepted date: 8 November 2017
Cite this article as: R. Sonia Singh and William H. O’Brien, A Quantitative
Synthesis of Functional Analytic Psychotherapy Single-Subject Research,
Journal of Contextual Behavioral Science,
https://doi.org/10.1016/j.jcbs.2017.11.004
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Running head: QUANTITATIVE SYNTHESIS OF FAP 1

A Quantitative Synthesis of Functional Analytic Psychotherapy Single-Subject Research

R. Sonia Singh, M.A.

William H. O’Brien, Ph.D.

Bowling Green State University

Corresponding author:
R. Sonia Singh
126 Psychology
Bowling Green State University
Bowling, Green OH 43403
rjsingh@bgsu.edu
(832) 530-9465

Highlights

 The current study attempts to summarize FAP single-subject research.


 There is great variation of demographics and treatment targets in FAP research.
 Effect sizes were “questionably – fairly effective,” “significant,” or “large.”

Functional Analytic Psychotherapy (FAP) is a contextual behavioral psychotherapy based

on the principles of behaviorism (e.g., Kohlenberg & Tsai, 1991; Tsai et al., 2009). FAP is

idiographic in nature, meaning that it often focuses on specific behaviors for individual clients

and proposes that the behaviors clients exhibit in sessions with a therapist are an index of

adaptive and problem behaviors that clients display in natural environments. These in-session

behaviors are referred to as Clinically Relevant Behaviors (CRBs). CRBs are divided into three
QUANTITATIVE SYNTHESIS OF FAP 2

categories: CRB1s are problematic behaviors, CRB2s are adaptive behaviors, and CRB3s are the

client’s descriptions of the topography and function his or her behaviors outside of the session.

Further, the FAP therapist acknowledges behavior outside of the context of therapy. For

example, outside of the therapy session problem behaviors (O1s) and outside of the therapy

session adaptive behaviors (O2s) can be targeted for change.

Well-established behavioral principles are used to promote in-session changes in CRB1s,

CRB2s, and CRB3s. This is accomplished by first carefully operationalizing CBR1s and CRB2s.

Following operationalization, the therapist uses reinforcement-based techniques (e.g., social

attention, verbal reinforcement, non-verbal reinforcement) to increase the frequency of CRB2s.

To decrease the frequency of CRB1s the therapist uses a combination of procedures such as

differential reinforcement of other/incompatible behavior, selective non-reinforcement/ignoring,

and verbal redirection (Tsai et al., 2009). In order to do this effectively, the FAP therapist is

trained to be (a) acutely aware CRB occurrences and (b) consistent with providing in-session

experiences that promote the acquisition, shaping, and maintenance of adaptive changes in

CRB1s and CRB2s. FAP utilizes a system of five rules to guide the therapist: (1) watch for

CRBs; (2) evoke CRBs; (3) reinforce CRB2s; (4) assess therapist impact on client behavior; and

(5) evaluate and generalize (for a detailed description of these five rules see Tsai et al., 2009).

Efficacy of FAP

The empirical evidence regarding the efficacy of FAP is limited (Hayes, Masuda,

Bissett, Luoma, & Guerror, 2005). Mangabeira, Kanter, and Del Prette (2012) conducted a

qualitative review of FAP publications from 1990 to 2010. The authors reported that the majority

of articles written about FAP were conceptual rather than empirical. Further, their analysis of

empirical studies indicated that a majority used single-subject data or were uncontrolled case
QUANTITATIVE SYNTHESIS OF FAP 3

studies. The authors also noted that a number of these single-subject studies did not include

statistical analyses or graphical representation of treatment effects.

Öst (2008) conducted a meta-analysis of third-wave behavioral therapies including FAP,

Acceptance and Commitment Therapy, Dialectical Behavior Therapy, Cognitive Behavioral

Analysis System of Psychotherapy, and Integrative Behavioral Couple Therapy. The author

noted that FAP did not have any randomized control trials; Therefore, he could not include it in

his analyses. Although this meta-analysis was generally unfavorable of third-wave behavioral

therapies and has received criticism in the field, the noting of the lack of FAP randomized

control trials was understandable and reasonable. Corrigan (2001) and García (2008) also noted

the lack of randomized control trials and criticized FAP for making claims of efficacy without

high quality empirical evidence such as randomized control trials.

Since the aforementioned reviews, one small-sample randomized control trial has been

published by Maitland et al. (2016b). In this study, 11 individuals received a FAP intervention

while another 11 individuals participated in a “watchful waiting” control condition. The

researchers reported that the FAP group, relative to the control group, reported significantly

increased interpersonal functioning (as measured by fear of intimacy) and lower psychological

symptomology. The researchers concluded this was a modest study and that more research is

needed (Maitland et al., 2016b).

In summary, FAP is a contextual behavioral therapy that uses well-established behavioral

principles occurring within the context of the therapeutic relationship to promote adaptive in-

session behavior change that is then intended to generalize to outside of session contexts. FAP

uses five rules to decrease problematic behavior and increase adaptive behavior. The published

empirical evidence examining FAP is limited to one randomized control trial and many single-
QUANTITATIVE SYNTHESIS OF FAP 4

subject studies. Additionally, to date, reviews of FAP research have used qualitative methods and

there has been no quantitative synthesis of this literature at this time.

Quantitative Synthesis of Single-Subject Research: Treatment Effect Indices

Single-subject research has been extensively used in several fields (e.g., Barlow, Nock, &

Hersen, 2009; Horner et al., 2005; Kazdin, 1982). Proponents of single-subject research suggest

it is one of the initial steps in identifying evidence-based practices (Horner et al., 2005). Further,

when designed appropriately, single-subject research can validly evaluate causal relationships

between interventions and outcomes (Haynes, O’Brien, & Kaholokula, 2011). Another

advantage of single-subject research is that it can be used to examine unique and/or rare

phenomena as well as idiosyncratic responses to interventions. Importantly, single-subject data

can be quantitatively synthesized (Manalov, Guilera, & Sierra, 2014; Shadish, Hedges, &

Pustejovsky, 2014).

In most applications, meta-analytic techniques are designed to aggregate and analyze

quantitative findings from multiple studies that generate group-based data and inferential

statistical testing. However, single-subject studies typically do not generate group-level data nor

group-level inferential statistical testing; single-subject studies may also include graphs without

numerical data. Therefore, a different approach is required.

Researchers have developed ways to generate quantitative data from single-subject

graphs using mapping and digitizing technology (e.g., Shadish & Sullivan, 2011). This allows

quantification of published single-subject graphs that do not provide numerical values and

aggregate effect sizes.

The results section of this paper will provide detailed information about the

methodological characteristics of FAP treatment outcome studies. However, here it is important


QUANTITATIVE SYNTHESIS OF FAP 5

to briefly review the unique methodological features of FAP research that influence how these

studies can be quantified, aggregated, and interpreted. First, FAP has been almost exclusively

evaluated using an A/B (baseline/treatment) single-subject design. Second, FAP, like most

psychotherapies, is focused on treatment outcomes. Thus, trends and session-by-session changes

in client behavior are less relevant than measures of end-state functioning (i.e., level of

functioning at the conclusion of treatment relative to baseline). Third, FAP data are often

presented in graphs without numerical values for data points. Finally, a majority FAP studies

have a small number of data points.

The aforementioned characteristics of FAP research suggest that a quantitative synthesis

should: (a) generate and synthesize A/B or pre-post effect sizes, (b) use mapping technologies

that can reliably and accurately generate quantitative information from graphs that do not

provide data values, and (c) use effect sizes that do not require a large number of data points.

Four well-established single-subject effect size indices are well suited for a FAP quantitative

synthesis: Percentage of non-overlapping data (PND: Scruggs & Mastropieri, 1998), Split

Middle Trend Estimation (SMTE: White, 1974), Swanson’s dsw (Swanson, Hoskyn & Lee, 1999)

and the Reliable Change Index (RCI: Jacobson & Truax, 1991). Each of these metrics provide

some unique information that may be important to assess differences in FAP outcomes.

The PND (Scruggs & Mastropieri, 1998) is a common and well-established method for

evaluating single-subject effects. This metric is often considered one of the standard ways to

assess and aggregate single-subject design research. The SMTE (White, 1974) was developed to

address the challenge of outliers and trends in single-subject data that adversely affect PND

calculations. This metric provides unique information because it accounts for trends and outliers
QUANTITATIVE SYNTHESIS OF FAP 6

in data. Additionally, a nonparametric binomial test of significance can be used to determine

likelihood of effect.

It should be noted that PND and SMTE do not provide an index of magnitude of effect.

Therefore, researchers developed effect size indices that could be used to supplement the

information provided by the PND and SMTE. Swanson, Hoskyn and Lee (1999) developed dsw in

order to generate an effect size that provides an estimate of treatment outcomes based on end-

state functioning. That is, the effect size is based on the participant’s level of functioning at the

conclusion of treatment relative to baseline.

Each of these effect sizes provides different information based on graphed data: (1) PND

offers overall effect and it is often the standard in single-subject design meta-analysis, (2) SMTE

provides a way to assess for outliers and trend, and (3) dsw provides magnitude of effect. In

addition to the aforementioned methods that can be used to quantify graphed data, the Reliable

Change Index (RCI) can be used in single-subject studies when questionnaires are used to

evaluate treatment outcomes (Jacobson & Truax, 1991).

Summary and Aims of the Present Investigation

FAP is a contextual behavioral therapy that applies behavioral principles to the

interpersonal relationship between therapist and client. Given that FAP has primarily been

evaluated with single-subject studies and that current reviews of FAP are qualitative, there is a

need to better understand the effectiveness of this therapy using a quantitative approach.

An examination of single-subject quantitative synthesis techniques indicated that PND,

SMTE, Swanson’s dsw, and RCI are well suited for aggregating data from FAP studies. There

were three principal aims of the current study. The first aim was to conduct a methodological

review of FAP single-subject studies. This included reviewing the demographic characteristics of
QUANTITATIVE SYNTHESIS OF FAP 7

participants, length of treatment, and type of FAP therapy provided. The second aim was to

evaluate and synthesize treatment effects using different quantitative indicators of outcomes. A

third aim was to examine the extent to which the effectiveness of FAP varied as a function of

participant characteristics, treatment characteristics, or target characteristics.

Method
Selection Criteria

Several databases were used to find articles for the current study including PsycINFO

(1872 to present), Psychology and Behavioral Sciences Collection (1930s to present), and ERIC

(1966 to present). The following search terms were used: Functional Analytic Psychotherapy,

FAP treatment, FAP single-subject design, FAP single-case design, and FAP case study. The

search for relevant articles and studies occurred between February 2015 through January 2017.

To be included in this study, articles had to meet the following criteria: (a) the study used

single-subject methods, (b) a FAP based treatment was provided for individual clients, (c) the

study contained data that could be coded (graphs, pre- and post-measures), and (d) the study was

published in a peer reviewed journal or doctoral dissertation. The first author also solicited list-

serves related to FAP to receive unpublished manuscripts that were appropriate for the current

study. Studies that met these inclusion criteria were then reviewed for the quantitative synthesis.

Article Coding

Each qualifying article was examined and coded for the following information: number

of participants, age, gender, ethnicity, FAP-related self-report measures, self-report measures of

psychological wellness (e.g., Beck Depression Inventory), number of treatment sessions, length

of treatment, relevant treatment statistics (pre- and post-assessments), and graphical

representation of treatment. Very few studies provided information on CRB3s, for this reason,
QUANTITATIVE SYNTHESIS OF FAP 8

only CRB1s and CRB2s were coded. Additionally, O1s and O2s were coded and included in the

current study when they were reported instead of CRB1s and CRB2s.

Given that the current study includes a mixture of CRBs and Os, in the following sections

we use the term Target Behavior 1 (TB1) to refer to problems behaviors (CRB1 and O1s) and

Target Behavior 2 (TB2) to refer to adaptive behaviors (CRB2s and O2s). If a participant had

several TB1s or TB2s reported in a single study, the individual TB1s and TB2s were averaged so

that each participant contributed only one TB1 and/or TB2 to the quantitative synthesis across

studies.

The first author and trained assistants independently coded articles using a coding form

that included article information (e.g., authors, year published, title, journal of publication,

affiliation of authors), participant information (e.g., number of participants per study, participant

demographics, participant diagnoses), methodology information (e.g., type of single-subject

design, number of baseline and treatment sessions, modality of therapy used), and effect sizes.

Disagreements that occurred between raters were resolved through consensus. The

methodological characteristics of all studies were double coded by first author and the trained

assistant. Inter-rater reliability for overall methodological coding was excellent (κ = .92). Inter-

rater reliability varied from 0.77 to 1.0 for different portions of coding (e.g., title, design,

participants, diagnoses).

A subset of studies (70%) were double coded for PND and SMTE. A high degree of

reliability was found between both PND and SMTE measurements. The average ICC for PND

was 0.98 with a 95% confidence interval that ranged from 0.97 to 0.99 (F(32,47)= 171.75,

p<.001).The average ICC for SMTE was 0.95 with a 95% confidence interval that ranged from
QUANTITATIVE SYNTHESIS OF FAP 9

0.91 to 0.98 (F(23,34) = 39.53, p<.001). RCI and dsw were not double coded because excel

formulas were used to calculate these values.

Regarding the assessment of overall methodology, the authors used the Single-Case

Reporting Guideline in BEhavioural Interventions (SCRIBE) developed by Tate et al. (2016).

This system is a 26-item review checklist that assess the title, abstract, introduction, methods,

results, discussion, and documentation of single-subject design research. Tate et al. (2016)

suggest this method be used for development, replication, and evaluation of single-subject design

researchers.

Using the search terms “Functional Analytic Psychotherapy”, “FAP treatment”, “FAP

single-subject design”, “FAP single-case design”, and “FAP case study,” 179 studies were

initially identified using a review of titles. These 179 studies represented a mixture of narrative

case studies, theoretical articles, literature reviews, and empirical studies. The authors reviewed

the abstracts of all 179 studies to determine eligibility for the current study.

From the abstract review of the 179 studies, the authors excluded 138 because these

articles were conceptual reviews, theoretical articles, or studies about measurement of FAP. The

remaining 41 studies were fully reviewed. Upon full text review, 23 studies were excluded

because they were narrative reviews, did not contain data that could be used to calculate any

effect size, or included data not specifically related to the current study (e.g., therapist training

outcomes, group based outcomes). If data were reported but not in a manner in which the authors

could calculate effect sizes (e.g., mean scores), the authors contacted the authors of the

publications. No additional studies were added using this approach because the authors of the

studies under question did not respond or reported that the original data was unavailable
QUANTITATIVE SYNTHESIS OF FAP 10

(destroyed or was no longer in their possession). This left a total of 18 studies for quantitative

synthesis.

The authors also used the invisible college approach and placed requests on FAP list-

serves and social media pages for articles relevant to this quantitative synthesis. Two additional

studies were collected using this strategy. The authors used an ancestry approach and examined

citations from the FAP articles to identify any studies that might have been missed in the search

strategy. No additional articles were located using this method. Finally, the descendency

approach was used to identify any additional articles that referenced the original FAP book by

Kohlenberg and Tsai (1991). No additional articles were located with these approaches. Thus, 20

qualifying studies were located using all of the aforementioned search and selection strategies.

Figure 1 provides visual representation of the article inclusion process.

Effect Size Calculation

Graph Digitization and Data Reduction. WebPlotDigitizer was used to generate data

from graphs without raw values. WebPlotDigitizer is a program that can be used to upload

graphs for mapping and obtaining values based on points on the graph. Once a graph was

uploaded, the researchers assigned a 10-point value to the x-axis and y-axis. After this, the

researchers identified each baseline and treatment point in each graph. After identifying each

point, WebPlotDigitizer generated X and Y coordinate values. In order to ensure that values were

appropriately extrapolated from WebPlotDigitizer, the researchers entered the values from the

program into Ploty, a web based graphing program. The researchers then compared the Ploty

graphs with the original graphs to assure that digitization was accurate.

Percentage of Non-Overlapping Data (PND). When a graph presented behaviors

targeted for reduction during treatment (i.e., TB1), the researcher identified the lowest data point
QUANTITATIVE SYNTHESIS OF FAP 11

that occurred during the baseline phase of the study. Next, the researcher determined how many

data points in the treatment phase fell below the lowest baseline point. Finally, the researcher

calculated the PND by dividing the total number of points below the lowest baseline point by the

total number of interventions points and multiplying the result by 100.

When a graph presented behaviors targeted for increases during treatment (i.e., TB2), the

researcher identified the highest data point that occurred during the baseline phase of the study.

Next, the researcher determined how many data points in the treatment phase fell above the

highest baseline point. Then, the researcher calculated the PND by dividing the total number of

points above the highest baseline point by the total number of interventions points and

multiplying the result by 100.

PND values can vary from 0% to 100%; Scruggs and Mastropieri (1998) recommend that

PND scores be classified as follows: PND < 50%: unreliable treatment; PND 50% – 70%:

questionable effectiveness; PND 70 – 90% fairly effective; and PND > 90%: highly effective.

Split Middle Trend Estimation (SMTE). Split-middle trend of estimation was

calculated in four steps: (a) the baseline phase was divided into halves, (b) the median point on

the y-axis in each half of the baseline phase was identified, (c) a straight line connecting the

median points of each baseline half was drawn and extended into the treatment phase, and (d) the

number of data points in the treatment phase that fell above or below this line were counted and

divided by the total number of data points in the treatment phase. For example, if the treatment

target was a TB1, the number falling below the line were counted; If the treatment target was a

TB2, the number falling above the line were counted. The proportion of data points that fell

above or below the celeration line were converted into a percentage so that they could be
QUANTITATIVE SYNTHESIS OF FAP 12

compared to the PND. Finally, a binomial calculation was performed to evaluate the probability

of obtaining each outcome.

A minimum of four baseline points are required to adequately create a celeration line.

Therefore, SMTE was not calculated for any graph with less than four baseline data points.

Given that the null hypothesis for SMTE would be less than 50%, the authors utilized similar

recommendations of PND for SMTE: SMTE < 50%: unreliable treatment; SMTE 50% – 70%:

questionable effectiveness; SMTE 70 – 90% fairly effective; and SMTE > 90%: highly effective.

Swanson’s dsw. Swanson’s dsw was calculated by: (a) forming a baseline mean using the

last three data points during the baseline phase, (b) forming a treatment outcome mean using the

last three data points in the treatment phase, (c) computing a difference score by subtracting the

baseline mean from the treatment mean, and (d) dividing the difference score by the pooled

standard deviation corrected for correlation (the correlation was between the last three baseline

data points and the last three treatment data points). If there were less than three points in the

baseline phase or treatment phase, then only two points in baseline and two points in treatment

were used to calculate this statistic. The formula for calculating Swanson’s dsw is as follows:

dsw = (Mt – Mb)/(SD/√2(1-r)

As is evident in this formula, higher dsw values indicate larger treatment effects. For

example, a dsw = 1it indicates that the level of functioning at the conclusion of treatment is 1

standard deviation higher than the level of functioning at the conclusion of baseline and when a

dsw = 2, it indicates that the level of functioning at the conclusion of treatment is two standard

deviations higher than the level of functioning at the conclusion of baseline. Swanson and

Sachse-Lee (2000) argued that dsw should be interpreted in the same way that Cohen’s d is
QUANTITATIVE SYNTHESIS OF FAP 13

interpreted in the psychotherapy meta-analytic literature. Specifically, Cohen (1988) suggested

that effect sizes using d could be classified as “small” (.20), “medium” (.50) and “large” (.80).

Reliable Change Index. The RCI is a standardized score used to assess change in an

individual’s score on a survey measure and uses the participant’s pre- and post-treatment scores,

standard deviation, and reliability coefficients. Given that the studies had small sample sizes, the

standard deviations and reliability coefficients from large sample validation studies were used to

calculate RCIs. The formulas for calculating RCI and the standard error of the difference are

provided below.

RC = X2 – X1 Sdiff = √2(SE)2 SE = SD√(1-r)


Sdiff

In the above formulas, X2 is the post-treatment score, and X1 is pre-treatment score. The

standard error of the difference (Sdiff) is the square root of the standard error of measurement

(SE) squared and multiplied by two. The standard error of measurement is the standard deviation

multiplied by the square root of 1 minus the reliability coefficient for a particular measure.

The RCI can range from -∞ to +∞ with no pre-treatment to post-treatment difference

being zero. Larger RCIs indicate greater treatment effects. When the RCI is greater than +/-1.96

(the 95% confidence interval around the null of zero), then it is labelled “statistically reliable”

because the magnitude of difference is greater than what would be expected to occur by chance

or passage of time. If the RCI was less than +/-1.96, then it is labelled as “not statistically

reliable.” Individual RCI scores were aggregated by calculating an overall mean RCI for all

single-subject studies included in the current study. The RCI was only calculated for studies in

which self-report inventory data were provided.

Effect Size Aggregation and Analysis


QUANTITATIVE SYNTHESIS OF FAP 14

Graph data for each TB1 and TB2 were converted to PND, SMTE, and Swanson’s dsw. A

RCI score was calculated for each self-report inventory for each participant where pre- and post-

treatment data were provided. In studies where multiple TB1, TB2, or questionnaires were

collected on a single participant, an average PND, SMTE, dsw, and RCI were calculated for that

participant. Additionally, when examining studies with multiple phases, only the first baseline

and treatment phase were used to calculate the effect size.

The overall mean PND, SMTE, dsw, and RCI were calculated across studies for each

outcome variable. The mean was used because it was not possible to use the Hedges-

Pustejovsky-Shadish (Shadish et al., 2014) combined effect size calculation which requires three

or more participants per study (only three studies had three or more participants). Follow-up

analyses were then conducted to determine whether there were significant differences in

outcomes due to participant characteristics, treatment characteristics, or targets of treatment.

Results

Methodological Characteristics of Studies

Publication Characteristics. Out of the 20 studies located, 18 studies were published

between 1994 and 2017. The two remaining studies are currently unpublished. These articles

were produced by 43 authors. The articles were published in 9 different journals with most being

reported in the journal of International Journal of Behavioral Consultation and Therapy, Clinical

Case Studies, and The Psychological Record. The methodological characteristics of studies are

reported in Tables 1 and 2.

Participant Characteristics. There were a total of 37 participants across all studies (19

males, 15 females, and 3 participants whose gender was not reported). The average number of

participants per article was 2 (Range 1 – 5). However, it should be noted that two studies utilized
QUANTITATIVE SYNTHESIS OF FAP 15

data from the same participant resulting in the final count of 36 participants (Busch et al., 2009;

Kanter et al., 2006). This participant was a 24/25-year-old African American female and her data

were combined in reporting participant characteristics and effect size calculation for TB1.

Participants varied in age from 7 to 72 (M = 28.69, SD = 14.3) and the age of three participants

was unknown. Information regarding ethnicity was not provided for 21 participants. For the 14

participants whose ethnicities were reported, 10 were Caucasian, two were biracial, one was

American American, and one was Latin.

Diagnoses were provided for 18 participants. The most common diagnoses were: mood

disorders (n = 6), personality disorders (n = 3), co-morbid mood and personality disorders (n =

3), co-morbid mood, personality and substance disorders (n = 2), co-morbid mood, posttraumatic

stress disorder, and substance use disorder (n = 1), co-morbid mood, anxiety, and personality (n

= 1), co-morbid mood and psychotic disorder (n = 1), and co-morbid personality and psychotic

disorder (n = 1).

Targets of Treatment. A variety of TB1s and TB2s were targeted for treatment (see

Table 2). For review in the current study, TB1s and TB2s were categorized based on subscales of

the Functional Ideographic Assessment Template-Questionnaire (Callaghan, 2006). As detailed

in Table 2, the most common TB1s were: problematic disclosure, problematic emotional

expression, and conflict. The most common TB2s were: Effective disclosure, adaptive emotional

expression, and bidirectional communication.

Intervention Characteristics and Study Design. The average number of observations

during baseline was 6 (Range 2 – 12) and the average number of observations during treatment

was 11 (Range 4 – 25). Most studies provided FAP-Alone (n = 14). In other instances, FAP was

combined with Acceptance Commitment Therapy (n = 1), Cognitive-Behavior Therapy (n = 1),


QUANTITATIVE SYNTHESIS OF FAP 16

Behavioral Activation (n = 2), Cognitive-Therapy (n = 1), and Child Behavior Analytic Therapy

(n = 1). Based on this information, two categories of studies were formed: (a) “FAP-Alone”

treatment and (b) “FAP-Enhanced” treatment. Seven studies utilized case study approaches, 10

utilized A/B or A/A+B design, two used a multiple baseline design, and one used a reversal

design.

The SCRIBE method indicated that none of the studies met the full 26 criteria of the

SCRIBE methodology. The range of studies based on the SCRIBE method score was 2 to 17.

The mean score was 11.95. There was variability in scoring given that some studies were case

studies, A/B designs, and more sophisticated single-subject designs. The overall grade for each

study are presented in Table 1.

Analysis of FAP Effects

Table 3 and 4 provide a summary of PND, SMTE, and Swanson’s dsw effect sizes for TB

1s and TB 2s. Table 5 provides a summary of RCI scores. All effect size metrics did not show

evidence of significant skew or kurtosis utilizing both standard error of skewness and standard

error of kurtosis for significance testing. Therefore, it was appropriate to average and conduct

statistical testing on measures when relevant and applicable. The mean PND for all TB1s and

TB2s were respectively 58.70% (n = 21, SD = 40.76, 95% CI = 41.31 – 76.07) and 79.39% (n =

24, SD = 31.71, 95% CI = 67.67 – 92.51). Using Scruggs and Mastroprieri’s (1998)

classification, the mean PND for TB1s was classified as “questionably effective” with the 95%

confidence interval ranging from “ineffective” to “fairly effective”. The mean PND for TB2s

was be classified as “fairly effective” with a 95% confidence interval ranging from “questionably

effective” to “highly effective.”


QUANTITATIVE SYNTHESIS OF FAP 17

The overall mean SMTE for TB1s and TB2s were respectively 69.43% (n = 16, SD =

36.28, 95% CI = 50.70 – 88.17) and 80.66% (n = 18, SD = 30.29, 95% CI = 69.27 – 96.85). The

mean SMTE for TB1s fell into the upper range of the “questionably effective” classification with

the 95% confidence interval ranging from “questionably effective” to the upper end of “fairly

effective.” Of the 15 SMTE analyses that could be conducted for TB1s for all participants, 7

(47%) were significant. The mean SMTE for TB2s fell into the “fairly effective” classification

with a 95% confidence interval ranging from the upper end of “questionably effective” to

“highly effective.” Out of the 18 SMTE analyses that could be conducted for TB2s, 15 (83%)

were statistically significant.

The overall mean Swanson’s dsw for TB1s and TB2s were respectively 1.33 (n = 21, SD =

0.87, 95% CI = 0.95 – 1.71) and 1.85 (n = 24, SD = 0.97, 95% CI = 1.49 – 2.25). Of the 21

Swanson’s dsw for TB1s, 71% (n = 15) were large classification, 10% (n = 2) were medium, and

19% (n = 4) were small. Of the 24 Swanson’s dsw for TB2s, 83% (n = 20) were large, 4% (n = 1)

was medium, and 13% (n = 3) were small. Taken together, these results indicated that both TB1s

and TB2s reliably decreased from pre-treatment to post-treatment (none of the 95% confidence

intervals contained zero) and that for a majority of studies, the effect sizes were large. Finally,

the average effect size for TB2s was higher than the average effect size for TB1s.

RCI scores were divided into symptom-based RCI scores and quality of life-based RCI

scores. The symptom-based RCIs are analogous to TB1s and were expected to show a decrease

with FAP. Self-report survey data from seven participants reported in seven studies were used to

calculate the average symptom-based RCI. The quality of life-based RCIs are analogous to TB2s

and were expected to increase with FAP. Self-report survey data from ten participants reported in

eight studies were used to calculate the overall quality of life-based RCI. The means for
QUANTITATIVE SYNTHESIS OF FAP 18

symptom-based RCIs and quality of life-based RCIs were respectively 5.36 (n = 7, SD = 3.57,

95% CI = 3.09 – 7.61) and 2.93 (n = 10, SD = 3.16, 95% CI = 1.36 – 4.51). This indicated that

that both sets of RCIs were large and positive. Further, these RCIs were statistically reliable

given that the 95% confidence intervals did not include zero.

Analysis of Variation in FAP Effects

Given the variation in the metrics across studies, analyses were conducted to evaluate the

extent to which outcomes differed as a function of gender, ethnicity, and age. There were no

significant relationships observed between any of these demographic characteristics and any

outcome measure using PND, SMTE, dsw, or RCI.

A second set of analyses examined whether outcomes varied as a function of number of

sessions and whether FAP-Alone outcomes differed from FAP-Enhanced outcomes. Results

indicated that number of sessions was not significantly associated with any outcome measure

using PND, SMTE, dsw, or RCI. In order to compare FAP-Alone and FAP-Enhanced outcomes,

independent t-tests were conducted using the PNDs, SMTEs, and dsw as dependent variables.

Results indicated that the mean FAP-Alone PND for TB1s (M = 71.98, SD = 37.68) was

significantly higher (t (20) = 2.62, p = <0.05) than the mean FAP-Enhanced PND (M = 30.24,

SD = 33.42). Cohen’s d = 1.17 indicating large effect. All other t-tests were non-significant.

Failsafe Calculations. Noting that journals are biased toward publishing significant

findings, Rosenthal (1979) developed what has been termed a “failsafe number.” Rosenthal’s

failsafe number is the number of non-significant studies (or hypothesis tests) stored in file

drawers that would be needed to raise the overall p-value in a conventional meta-analysis that

aggregates group-level data to > .05. Because the current study is aggregating single-subject
QUANTITATIVE SYNTHESIS OF FAP 19

data, Rosenthal’s failsafe calculation cannot be used. However, Orwin (1983) and Wolf (1986)

developed a failsafe calculation that is based on d as follows:

Nfs = No(do – dc) dc.

In this formula, Nfs is the failsafe number, No is the number of observed effect sizes, do is

the average d observed across studies, and dc is the criterion. Orwin (1983) and Wolf (1986)

recommend using Cohen’s (1988) effect size classification scheme for small or medium effects

(small d = .2, medium d = .5) to set the value of dc. Their argument is that using criterion effect

sizes of .2 or .5 represent minimal to modest responsiveness to a treatment which would thought

of as “clinically non-significant.”

The average Swanson’s dsw obtained in this study is analogous do in Orwin (1983) and

Wolf’s (1986) failsafe equations. As such, their equation can be used as a heuristic technique to

estimate failsafe numbers for the current synthesis of single-subject data. For TB, the average dsw

was 1.33 based on 21 effect sizes. Thus, the failsafe number compared against hypothetical small

or medium file drawer effect sizes are:

Small: Nfs = 21 (1.33 - .2)/.2 = 118.65 and

Medium: Nfs =21(1.33 - .5)/.5 = 34.86.

For TB2s, the average dsw was 1.85 based on 24 effect sizes. Thus, the failsafe number compared

against hypothetical small and medium file drawer effect sizes are:

Small: Nfs = 24 (1.85 - .2)/.2 = 198 and

Medium: Nfs = 24(1.85 - .5)/.5 = 67.5.

These failsafe numbers are addressing the following question: “How many unpublished

FAP single-subject treatment outcome studies demonstrating no improvement from baseline to

post-treatment are needed to reduce the overall Swanson’s dsw to the small (.2) or medium (.5)
QUANTITATIVE SYNTHESIS OF FAP 20

classification level?” As is evident in these calculations, the failsafe findings suggest that the

FAP outcomes found in this quantitative synthesis are quite robust when contrasted against the .2

and .5 criteria.

Discussion

The current study assessed the methodology of the FAP outcome studies by examining

and reporting the methods of all studies included. In order to better understand the effects of

FAP, overall effect sizes were calculated using PND, SMTE, dsw, and RCIs. Variation in effect

sizes was also evaluated.

The current review located 18 published FAP studies with outcome data and two

unpublished studies with outcome data for a total of 20 studies reviewed. A majority of the

studies used an A/B design and one used a reversal design. Participants varied in age, ranging

from childhood to late adulthood. A majority of participants were male. A majority of

participants were Caucasian. However, 13 out of the 20 studies did not report ethnicity. Few

studies reported DSM diagnoses.

The SCRIBE method (Tate et al., 2016) was used to evaluate the methodological rigor of

FAP studies None of the studies met all 26 criteria designed by Tate et al. (2016) to assess the

methodological rigor of single-subject studies. This indicated that the FAP literature can be

improved with more rigorous design and treatment outcome evaluation methods. A majority of

the studies provided basic information related to the SCRIBE criteria (e.g., background

information, aims, study design, and description of intervention). However, some of the more

sophisticated criteria were not met by the studies in the current review (e.g., statement of adverse

events, availability of study protocol, or explicit statement of whether or not any funding sources

were provided for the study).


QUANTITATIVE SYNTHESIS OF FAP 21

In summary, the FAP literature has some positive methodological features. First, there is

variation in targets of treatment with measures ranging from depression, anxiety, personality

disorders, and several other interpersonal issues. Second, there is good variation in age, gender,

and ethnicity. Finally, articles came from a diversity of researchers from several different

continents including North America, South America, and Europe.

Despite the above mentioned methodological strengths, most FAP outcome studies are

limited by the use of designs which provide only weak causal inference, uncertain construct

validity, and questionable generalizability. First, the extensive use of A/B designs provide weak

evidence of causality. Several other well-known causal inference threats could produce A/B

changes in behavior, such as regression to the mean, maturation, and reactivity to

observation/repeated measurement. Secondly, the absence of placebo comparison conditions

(e.g., non-contingent but affirming therapist responding during sessions) hampers the construct

validity of FAP studies because one cannot infer that it is FAP techniques per se that are

responsible for A/B changes. Finally, the near exclusive use of single-subject investigations

constrains generalizability to other persons and contexts.

Based on our quantitative analysis of outcomes, there is evidence that there were reliable

differences from pre-treatment to post-treatment. Further, the magnitude of observed differences

varied as a function of outcome metric used, target of treatment, and treatment type. For PND

and SMTE analyses the overall mean effect sizes fell into the “questionably effective” to “fairly

effective” classification. Alternatively, the Swanson’s dsw analyses indicated that pre-treatment

to post-treatment differences were consistently large and reliable. Similarly, the mean RCIs were

also large and consistently classified as “statistically reliable.” In terms of treatment targets, the

pre-treatment to post-treatment differences tended to be larger for TB2s relative to TB1s. Finally,
QUANTITATIVE SYNTHESIS OF FAP 22

outcomes from FAP-Alone interventions were more favorable than FAP-Enhanced interventions

for TB1s.

The quantitative outcomes, combined with methodological considerations, can be used to

address three important questions about the FAP literature. These questions are: (a) is there

evidence of statistically reliable and clinically significant pre-treatment to post-treatment

differences; (b) are the pre-treatment to post-treatment differences greater than what would have

been expected to have occurred by chance or the passage of time; and (c) can the pre-treatment

to post-treatment differences be unambiguously attributed to FAP?

Regarding the first question, the results of this quantitative synthesis indicate that across

studies, metrics, and targets of treatment, there is evidence that TB 1s reliably declined from pre-

treatment to post-treatment and that TB 2s reliably increased from pre-treatment to post-

treatment. The clinical significance of treatment effects varied as a function of the metric used to

quantify outcomes. Specifically, PND and SMTE analyses yielded more conservative estimates

of FAP effects relative to Swanson’s dsw.

The differences between PND, SMTE, and dsw can be attributed an important difference

in how these metrics are calculated. The PND and SMTE are derived from comparisons between

baseline and measures collected in the early, middle, and end points of therapy whereas dsw is

derived from comparisons between baseline and the end-of-treatment measurement. As such, the

PND and SMTE include data points that were collected before the intervention was completed.

This would inevitably reduce estimates of effectiveness given that behavior change is expected

to occur gradually as a function of therapist reinforcement and shaping.

PND and SMTE have been used many single-subject meta-analytic reviews. The

popularity of these metrics likely arises from their ease of calculation (simply counting data
QUANTITATIVE SYNTHESIS OF FAP 23

points in graphs that fall above and below some reference) and need for visual inspection only.

Alternatively, Swanson’s dsw requires that the study author provide numerical data for each point

on a graph (which very rarely occurs) or the use digitizing mapping technology in order to

generate values for data points on a graph.

Given that PND and SMTE are derived from less relevant data in the FAP literature and

that mapping and digitizing data are now more readily available, we recommend that future

single subject quantitative use dsw metrics for evaluating treatment outcomes where (a) end-state

functioning is the focus of treatment and (b) behavior change is expected to occur gradually

across time.

For the second question, some of the results of this review indicate that the pretreatment

to post treatment differences are greater than what would have occurred due to the passage of

time. This position is primarily based on RCI, dsw, and SMTE data. Specifically, the RCI

calculation takes into account the standard error of measurement which is an index of the amount

of change in a score that would be expected to occur by chance and/or with repeated

administration across time. Most of the RCIs in our analyses exceeded the 95% confidence

interval by a large margin. The dsw findings also support this conclusion. Correcting the

standardized pre-treatment-post-treatment difference for correlation reduces the influence of

serial dependency and trends in the dsw calculation. This is important because serial dependency

and trends would be the principal mechanism through which non-treatment related factors (e.g.,

regression, maturation, etc.) would produce pre-treatment-post-treatment differences in TB1s and

TB 2s. Finally, the SMTEs corrected for pre-treatment trends. Taken together, the RCI, dsw, and

SMTE outcomes support an argument that the pre-treatment to post-treatment changes were not

exclusively due to random variation, passage of time, or trends.


QUANTITATIVE SYNTHESIS OF FAP 24

Regarding the final question, it is unlikely the differences can be unambiguously

attributed to FAP. As noted earlier, the absence of placebo comparison conditions (e.g., an

ABAB designed varying contingent and non-contingent therapist responding or random

assignment to a placebo control group using a group design), makes it impossible to attribute the

pre-treatment to post-treatment differences to FAP. Several other factors may have promoted

changes in participant behavior (e.g., therapist attention and empathy) that were not explicitly

part of the FAP intervention. Finally, a number of the classic causal inference threats could

account for some of the pre-post differences. The more salient of these threats would be: history,

maturation, regression to the mean, and reactivity to observation/measurement.

In examination of the potential moderators, one interesting finding was that FAP-Alone

outperformed FAP-Enhanced interventions based on the PND effect sizes for TB1s. This finding

is logical if one considers the nature of FAP sessions and therapist-client interactions.

Specifically, in FAP-Alone, the therapist will engage in actions that are systematically,

consistently, and directly targeting specific behaviors. It is possible in the FAP-Enhanced

interventions, the therapist was providing more didactic material and possibly focused on other

behaviors that were not directly related to interpersonal interactions. Thus, in any given session,

there would be fewer opportunities for the client to emit target behaviors. Similarly, the therapist

would have fewer opportunities to provide systematic consequences for the targeted behaviors.

Another consistent finding was that there were larger effects observed for TB2s relative

to TB1s. This finding is congruent with FAP principles and learning theory. Specifically, FAP

emphasizes the use of reinforcement to promote acquisition of adaptive TB2s in session. While

extinction and punishment can be used to suppress TB1s, these techniques are not preferred

because they may have an adverse impact on the therapeutic relationship. Instead, the therapist
QUANTITATIVE SYNTHESIS OF FAP 25

aims to increase TB2s with the notion that this increase in adaptive behavior will simultaneously

reduce TB1s as in differential reinforcement of other/incompatible behavior. As such, it would

be plausible to argue that the direct and more frequent reinforcement of TB2s would yield a

larger treatment effect. Further, it may be that it is more challenging for therapists to address

TB1s (e.g., re-direction, selective ignoring, blocking) than it is to reinforce TB2s. Previous

research has shown component-process analysis of reinforcing TB2s (Haworth et al., 2015), and

further research may benefit from exploring this process for blocking TB1s.

Limitations

One major consideration of the current findings is the “file drawer” problem. The failsafe

question for this literature is: “How many times have FAP researchers initiated a single-subject

treatment study but failed to report or publish the result because the client did not respond to

treatment?” The failsafe analyses in this paper indicate that a substantial number of single-

subject studies with nonresponsive clients would be needed to reduce the average Swanson’s dsw

to small or medium effect size classifications. However, given that there is ample evidence of a

bias toward publishing significant results relative to non-significant results in psychological

research, it is reasonable to argue that there are at least some studies with nonresponsive clients

in the file drawers of FAP researchers. Adding data from these studies would reduce the

magnitude and reliability of the overall effect sizes reported in this paper. Thus, it is likely that

the findings reported in this quantitative synthesis overestimate the effectiveness of FAP to some

extent. However, the precise amount of overestimation cannot be calculated.

Conclusion

The current study is a quantitative analysis of the existing FAP single-subject design

treatment literature. It provides an estimate of the efficacy of FAP based on the currently
QUANTITATIVE SYNTHESIS OF FAP 26

available single-subject studies. These results indicate that FAP may be associated with reliable

treatment effects for a variety of behaviors based on pre-post comparisons. As such, FAP may be

a promising approach that is based on an innovative application of behavioral principles to the

therapeutic relationship. However, it is difficult to assess if changes in the participants in this

study are due solely to FAP due to the limitations of the research designs used. It is also difficult

to assess how many unpublished, failed trials exist that may nullify the results presented in this

paper.

There remains a clear need for more systematic and methodologically rigorous FAP

research. Most importantly, the authors recommend that researchers conduct more randomized

control trials so that stronger causal statements can be made about FAP effectiveness. Although

the turn-by-turn coding of the FAP rating system (FAPRS: Callaghan & Follette, 2008) may be

cumbersome for randomized control trials, the recent development of treatment adherence

measures (e.g., Maitland et al., 2016a; Maitland et al., 2016b) and self-report measures targeting

FAP-specific constructs (e.g., Darrow, Callaghan, Bonow, & Follette, 2014; Leonard et al.,

2014) may assist with the development of randomized control trials.

If FAP researchers continue to utilize single-subject design studies, then it may also be

beneficial to use guidelines established for strong methodological rigor (e.g., Tate et al., 2016).

Moreover, the use of multiple baseline designs, A/B/C designs, or reversal/withdrawal designs

could more fully assess the causal effects of FAP. In addition to stronger methodology, the

authors recommend that researchers place a greater emphasis on collecting and reporting

participant demographic characteristics in order to more clearly understand the efficacy of FAP

and the different populations with which it may be effective. Finally, the use of placebo
QUANTITATIVE SYNTHESIS OF FAP 27

comparisons conditions would permit a better evaluation of the specific effects of FAP

techniques relative to nonspecific supportive listening and responding.


QUANTITATIVE SYNTHESIS OF FAP 28

References

Barkham, M., Margison, F., Leach, C., Lucock, M., Mellor-Clark, J., Evans, C., . . . McGrath, G.

(2001). Service profiling and outcomes benchmarking using the CORE-OM: Toward

practice-based evidence in the psychological therapies. Journal of Consulting and

Clinical Psychology, 69(2), 184-196. doi:10.1037/0022-006X.69.2.184

Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies

for studying behavior change (3rd ed.). Boston, MA: Allyn and Bacon.

Barraca, J. (2004). Spanish Adaptation of the Acceptance and Action Questionnaire

(AAQ). International Journal of Psychology and Psychological Therapy, 4, 505-

515.

Baruch, D. E., Kanter, J. W., Busch, A. B., & Juskiewicz, K. (2009). Enhancing the therapy

relationship in Acceptance and Commitment Therapy for psychotic symptoms. Clinical

Case Studies, 8, 241-257

Bond, F. W., Hayes, S. C., Baer, R. A., Carpenter, K. M., Guenole, N., Orcutt, H. K., ... & Zettle,

R. D. (2011). Preliminary psychometric properties of the Acceptance and Action

Questionnaire–II: A revised measure of psychological inflexibility and experiential

avoidance. Behavior therapy, 42(4), 676-688.

Busch, A. M., Kanter, J. W., Callaghan, G. M., Baruch, D. E., Weeks, C. E., & Berlin, K. S.

(2009). A micro-process analysis of functional analytic psychotherapy's mechanism of

change. Behavior Therapy, 40(3), 280-290. doi:10.1016/j.beth.2008.07.003

Callaghan, G. M. (2006). The Functional Idiographic Assessment Template (FIAT) System: For

use with interpersonally-based interventions including Functional Analytic


QUANTITATIVE SYNTHESIS OF FAP 29

Psychotherapy (FAP) and FAP-Enhanced treatments. The Behavior Analyst Today, 7,

357-398. doi:10.1037/h0100160

Callaghan, G. M., & Follette, W. C. (2008). Coding Manual for the Functional Analytic

Psychotherapy Rating Scale (FAPRS). The Behavior Analyst Today, 9, 57-97.

doi:10.1037/h0100649

Callaghan, G. M., Summers, C. J., & Weidman, M. (2003). The treatment of histrionic and

narcissistic personality disorder behaviors: A single-subject demonstration of clinical

improvement using functional analytic psychotherapy. Journal of Contemporary

Psychotherapy, 33(4), 321-339. doi:10.1023/B:JOCP.0000004502.55597.81

Cattivelli, R., Tirelli, V., Berardo, F., & Perini, S. (2012). Promoting appropriate behavior in

daily life contexts using functional analytic psychotherapy in early-adolescent children.

International Journal of Behavioral Consultation and Therapy, 7(2-3), 25-32.

doi:10.1037/h0100933

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hilsdale. NJ: Lawrence

Earlbaum Associates, 2.

Corrigan, P. W. (2001). Getting ahead of the data: A threat to some behavior therapies. The

Behavior Therapist, 24(9), 189-193.

Darrow, S. M., Callaghan, G. M., Bonow, J. T., & Follette, W. C. (2014). The Functional

Idiographic Assessment Template-Questionnaire (FIAT-Q): Initial psychometric

properties. Journal of Contextual Behavioral Science, 3(2), 124–135.

Ferro-Garcia, R., Lopez-Bermudez, M. A., & Valero-Aguayo, L. (2012). Treatment of a disorder

of self through functional analytic psychotherapy. International Journal of Behavioral

Consultation and Therapy, 7(2-3), 45-51. doi:10.1037/h0100936


QUANTITATIVE SYNTHESIS OF FAP 30

García, R. F. (2008). Recent Studies in Functional Analytic Psychotherapy. International

Journal of Behavioral Consultation and Therapy. 4(2), 239-249. doi:10.1037/h0100846

Haynes, S. N., O’Brien, W. H., & Kaholokula, J. K. (2011). Behavioral Assessment and Case

Formulation. Hoboken, N.J.: John Wiley & Sons.

Hayes, S. C., Masuda, A., Bissett, R., Luoma, J., & Guerror, L. F. (2005). DBT, FAP, and ACT:

how empirically oriented are the new behavior therapy technologies? Journal of Behavior

Therapies, 35, 35–54. doi:10.1016/S0005-7894(04)80003-0

Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S. L., & Wolery, M. (2005). The use of

single-subject research to identify evidence-based practice in special education.

Exceptional Children, 71, 165–179.

Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining

meaningful change in psychotherapy research. Journal of Consulting and Clinical

Psychology, 59,12-19.

Kanter, J. W., Rusch, L. C., Busch, A. M., & Sedivy, S. K. (2009). Validation of the Behavioral

Activation for Depression Scale (BADS) in a community sample with elevated

depressive symptoms. Journal of Psychopathology and Behavioral Assessment, 31(1),

36-42.

Kanter, J., Landes, S., Busch, A., Rusch, L., Brown, K., Baruch, D., & Holman, G. (2006). The

effect of contingent reinforcement on target variables in outpatient psychotherapy for

depression: A successful and unsuccessful case using functional analytic psychotherapy.

Journal of Applied Behavior Analysis, 39(4), 463-467. doi:10.1901/jaba.2006.21-06

Kanter, J. W., Parker, C. & Kohlenberg, R. J. (2001). Finding the self: A behavioral measure and

its clinical implications. Psychotherapy: Theory, Research and Practice, 38, 198-211.
QUANTITATIVE SYNTHESIS OF FAP 31

Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings.

New York: Oxford University Press, Inc.

Kohlenberg, R. J., & Tsai, M. (1991). Functional Analytic Psychotherapy: A guide for creating

intense and curative therapeutic relationships. New York: Plenum.

Kohlenberg, R., & Tsai, M. (1994). Improving cognitive therapy for depression with functional

analytic-psychotherapy – theory and case study. Behavior Analyst, 17(2), 305-319.

Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief

depression severity measure. Journal of General Internal Medicine, 16(9), 606-613.

doi:10.1046/j.1525-1497.2001.016009606.x

Lambert, M. J., Burlingame, G. M., Umphress, V., Hansen, N. B., Vermeersch, D. A., Clouse, G.

C., & Yanchar, S. C. (1996). The reliability and validity of the outcome questionnaire.

Clinical Psychology & Psychotherapy, 3(4), 249-258.

Landes, S. J., Kanter, J. W., Weeks, C. E., & Busch, A. M. (2013). The impact of the active

components of Functional Analytic Psychotherapy on idiographic target behaviors.

Journal of Contextual Behavioral Science, 2(1), 49-57. doi:10.1016/j.jcbs.2013.03.004

Leonard, R. C., Knott, L. E., Lee, E. B., Singh, S., Smith, A. H., Kanter, J., … Wetterneck, C.

T. (2014). The development of the functional analytic psychotherapy intimacy scale. The

Psychological Record, 64(4), 647-657.

Lizarazo, N. E., Muñoz-Martínez, A. M., Santos, M. M., & Kanter, J. W. (2015). A within-

subjects evaluation of the effects of Functional Analytic Psychotherapy on in-session and

out-of-session client behavior. The Psychological Record, 65(3), 463-474.

doi.org/10.1007/s40732-015-0122-7
QUANTITATIVE SYNTHESIS OF FAP 32

Lopez, F. J. C. (2002) Jealousy: A case of application of functional analytic psychotherapy.

Apuntes de Psicologia, 20(3), 347-368.

Maitland, D. W. M., Kanter, J. W., Tsai, M., Kuczynski, A. M., Manbeck, K. E., & Kohlenberg,

R. J. (2016b). Preliminary findings on the effects of online Functional Analytic

Psychotherapy training on therapist competency. The Psychological Record, 66(4), 627-

637. doi.org/10.1007/s40732-016-0198-8

Maitland, D. W. M., Petts, R. A., Knott, L. E., Briggs, C. A., Moore, J. A., & Gaynor, S. T.

(2016a). A randomized controlled trial of Functional Analytic Psychotherapy versus

watchful waiting: Enhancing social connectedness and reducing anxiety and avoidance.

Behavior Analysis: Research and Practice, 16(3), 103-122. doi.org/10.1037/bar0000051

Manalov, R., Guilera, G., & Sierra, V. (2014). Weighting strategies in the meta-analysis of

single-case studies. Behavior Research, 46, 1152-1166.

Manduchi, K., & Schoendorff, B. (2012). First steps in FAP: Experiences of beginning

functional analytic psychotherapy therapist with an obsessive-compulsive personality

disorder client. International Journal of Behavioral Consultation and Therapy, 7(2-3),

72-77. doi:10.1037/h0100940

Mangabeira, V., Kanter, J., & Del Prette, G. (2012). Functional analytic psychotherapy (FAP): A

review of publications from 1990 to 2010. International Journal of Behavioral

Consultation and Therapy, 7(2-3), 78-89. doi:10.1037/h0100941

Manos, R. C., Kanter, J. W., Rusch, L. C., Turner, L. B., Roberts, N. A., & Busch, A. M. (2009).

Integrating functional analytic psychotherapy and behavioral activation for the treatment
QUANTITATIVE SYNTHESIS OF FAP 33

of relationship distress. Clinical Case Studies, 8(2), 122-138.

doi:10.1177/1534650109332484

McCarthy-Larzelere, M., Diefenbach, G. J., Williamson, D. A., Netemeyer, R. G., Bentz, B. G.,

& Manguno-Mire, G. M. (2001). Psychometric properties and factor structure of the

worry domains questionnaire. Assessment, 8(2), 177-191.

doi:10.1177/10731911010080020

McClafferty, C. (2012). Expanding the cognitive behavioural therapy traditions: An application

of functional analytic psychotherapy treatment in a case study of depression.

International Journal of Behavioral Consultation and Therapy, 7(2-3), 90-95.

doi:10.1037/h0100942

Meyer, T. J., Miller, M. L., Metzger, R. L., & Borkovec, T. D. (1990). Development and

validation of the Penn state worry questionnaire. Behaviour Research and Therapy,

28(6), 487-495. doi:10.1016/0005-7967(90)90135-6

Mundt, J. C., Marks, I. M., Shear, M. K., & Greist, J. M. (2002). The Work and Social

Adjustment Scale: A simple measure of impairment in functioning. The British Journal

of Psychiatry, 180(5), 461-464.

Orwin, R. G. (1983). A failsafe N for effect size in meta-analysis. Journal of Educational

Statistics, 8 157-150. DOI: 10.2307/1164923

Oshiro, C. K. B., Kanter, J., & Meyer, S. B. (2012). A single-case experimental demonstration of

functional analytic psychotherapy with two clients with severe interpersonal problems.

International Journal of Behavioral Consultation and Therapy, 7(2-3), 111-116.

doi:10.1037/h0100945
QUANTITATIVE SYNTHESIS OF FAP 34

Öst, L. G. (2008). Efficacy of the third wave of behavioral therapies: a systematic review and

meta-analysis. Behaviour Research and Therapy, 46(3), 296-321.

doi:10.1016/j.brat.2007.12.005

Pedersen, E. R., Callaghan, G. M., Prins, A., Nguyen, H. V., & Tsai, M. (2012). Functional

analytic psychotherapy as an adjunct to cognitive-behavioral treatments for posttraumatic

stress disorder: Theory and application in a single case design. International Journal of

Behavioral Consultation and Therapy, 7(2-3), 125-134. doi:10.1037/h0100947

Rosenthal, R. (1979). The “file drawer problem” and tolerance of null results. Psychological

Bulletin, 85, 638-641. doi/10.1037/0033-2909.86.3.638

Scruggs, T. E., & Mastropieri, M. A. (1998). Summarizing single-subject research: Issues and

applications. Behavior Modification, 22, 221-242.

Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of

single-case designs with a standardized mean difference statistic: A primer and

applications. Journal of School Psychology, 52(2), 123. doi:10.1016/j.jsp.2013.11.005

Shadish, W. R., & Sullivan, K.J. (2011). Characteristics of single-case designs used to assess

intervention effects in 2008. Behavior Research, 43, 971-980.

Singh, S., & O’Brien, W. H. (2016). Functional analytic psychotherapy for nursing home

residents: A single-subject investigation of session-by-session changes. Journal of

Contemporary Psychotherapy, doi:10.1007/s10879-016-9352-5

Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the quality of

marriage and similar dyads. Journal of Marriage and the Family, 15-28.
QUANTITATIVE SYNTHESIS OF FAP 35

Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for

assessing generalized anxiety disorder: The GAD-7. Archives of Internal Medicine,

166(10), 1092.

Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1999). Dimensions of the beck depression

inventory‐II in clinically depressed outpatients. Journal of Clinical Psychology, 55(1),

117-128. doi:10.1002/(SICI)1097-4679(199901)55:1

Swanson, H. L., Hoskyn, M., & Lee, C. (1999). Interventions for students with learning

disabilities: A meta-analysis of treatment outcomes. New York: Guilford.

Tate, R. L., Perdices, M., Rosenkoetter, U., McDonald, S., Togher, L., Shadish, W., ... &

Sampson, M. (2016). The Single-Case Reporting Guideline In BEhavioural Interventions

(SCRIBE) 2016: Explanation and elaboration. Archives of Scientific Psychology, 4(1), 1-

9.

Tsai, M., Kohlenberg, R. J., Kanter, J. W., Kohlenberg, B., Follette, W. C., & Callaghan, G. M.

(2009). A guide to functional analytic psychotherapy: Awareness, courage, love, and

behaviorism. New York, NY US: Springer Science + Business Media.

Villas-Bôas, A., Meyer, S. B., & Kanter, J. W. (2016). The effects of analyses of contingencies

on clinically relevant behaviors and out-of-session changes in functional analytic

psychotherapy. The Psychological Record, 66(4), 599-609. doi:10.1007/s40732-016-

0195-y

Virella, B., Arbona, C., & Novy, D. M. (1994). Psychometric properties and factor structure of

the spanish version of the state-trait anxiety inventory. Journal of Personality

Assessment, 63(3), 401-412. doi:10.1207/s15327752jpa6303_1


QUANTITATIVE SYNTHESIS OF FAP 36

White, O. R. (1974). The “split middle” a “quickie” method of trend estimation. University of

Washington, Experimental Education Unit, Child Development and Mental Retardation

Center.

Wolf, F. M. (1986). Meta-analysis: Quantitative methods for research synthesis. London: Sage.

Xavier, R. N., Kanter, J. W., & Meyer, S. B. (2012). Transitional probability analysis of two

child behavior analytic therapy cases. International Journal of Behavioral Consultation

and Therapy, 7(2-3), 182-188. doi:10.1037/h0100954


QUANTITATIVE SYNTHESIS OF FAP 37

Table 1
Brief description of studies used in quantitative synthesis
Participant
Total SCRIBE
Author Description Study Design Treatment Baseline Treatment
Sessions Score
(Age)
Baruch, Kanter,
Busch, & 1 Male (21) Case Study FAP-Enhanced ACT 37 9
Juskiewics (2009)

Busch et al., (2009) 1 Female (25) A/B Design FAP-Enhanced CBT 5 15 20 15

Callaghan,
Summers, & 1 Male (30) A/B Design FAP 23 13
Weidman (2003)
1 Unknown 2 6 8
Cattivelli Multiple
2 Unknown FAP 2 6 8 2
(Unpublished) Baseline
3 Unknown 2 8 10

1 Male (12) 4 11 15
Cattivelli, Tirelli, 2 Male (11) 7 9 16
Multiple
Berardo, & Perini 3 Male (12) FAP 7 8 15 14
Baseline
(2012) 4 Male (13) 6 14 20
5 Male (15) 8 18 26
Ferro-Garcia,
Lopez-Bermudez,
1 Female (24) Case Study FAP 7 16 23 9
& Valero-Aguayo
(2012)

Kanter et al., 1 Female (24) 12 8 20


A/B Design FAP 13
(2006) 2 Male (42) 8 4 12

Kohlenberg, &
1 Male (35) A/B Design FAP-Enhanced CT 8 7 15 12
Tsai (1994)
6 4 10
1 Female (44)
Landes, Kanter, 6 7 13
2 Female (20)
Weeks, & Busch A/A+B Design FAP 10 4 14 17
3 Male (28)
(2013) 7 7 14
4 Male (26)
4 10 14
Lizarazo, Muñoz- 1 Male (25)
5 14 19
Martínez, Santos, 2 Female (47) A/A+B Design FAP 18
6 10 16
& Kanter (2015) 3 Female (21)

Lopez (2002) 1 Male (31) A/B Design FAP 3 32 35 12

Manduchi, &
1 Female (36) Case Study FAP 52 10
Schoendorff (2012)

Manos et al.,
1 Female (22) Case Study FAP-Enhanced BA 8 13
(2009)

McClafferty (2012) 1 Male (35) Case Study FAP-Enhanced BA 30 9


QUANTITATIVE SYNTHESIS OF FAP 38

McCluskey 2
1 Male (25) A/B Design FAP-Enhanced BA 10 10 3
(Unpublished) 0
2
Oshiro, Kanter, & 1 Female (46) Reversal 11 9 0
FAP 16
Meyer (2012) 2 Male (18) Design 12 8 2
0
Pedersen,
Callaghan, Prins,
1 Female (41) Case Study FAP 14
Nguyen, & Tsai
(2012)
1 Male (72) 2 4 6
Singh & O'Brien
2 Male (52) A/B Design FAP 2 4 6 15
(2016)
3 Female (31) 2 4 6
3
Villas-Bôas,
1 Female (38) A/B/BC/B2/ 5 33 8
Meyer, & Kanter FAP 14
2 Female (32) BC2 5 28 3
(2016)
3
1
Xavier, Kanter, & 1 Female (10) FAP-Enhanced 7
Case Study 11
Meyer (2012) 1 Male (7) Child Therapy 3
1
QUANTITATIVE SYNTHESIS OF FAP 39

Table 2
Description of Diagnoses, TB1s, and TB2s per study
Authors P DSM Disorder CBR1 Dimension CRB2 Dimension
Disclosure Disclosure
Dysthymia
Baruch et al. (2009) 1 Emotional Emotional
Psychotic Symptoms
Expression Expression
Disclosure
Major Depressive Disorder
Busch et al. (2009) 1 Conflict Emotional
Histrionic Personality Disorder
Expression
Assertion of Needs Assertion of Needs
Bidirectional Bidirectional
Narcissistic Personality Disorder Communication Communication
Callaghan, et al. (2003) 1
Histrionic Personality Disorder Disclosure Disclosure
Emotional Emotional
Expression Expression
Cattivelli et al. (2012) 1 No description
Disclosure
Ferro-Garcia et al. (2012) 1 Major Depressive Disorder Disclosure Emotional
Expression
Major Depressive Disorder Conflict
Kanter et al. (2006) 1 Disclosure
Histrionic Personality Disorder Disclosure
Major Depressive Disorder Bidirectional
Bidirectional
2 Personality Disorder, NOS Communication
Communication
Past polysubstance dependence Conflict
Kohlenberg & Tsai
1 Depression No description
(1994)
Major Depressive Disorder
Landes et al. (2013) 1 Generalized Anxiety Disorder Assertion of Needs Assertion of Needs
Depressive Personality Disorder
Major Depressive Disorder
Avoidant Personality Disorder Bidirectional Bidirectional
2
Obsessive Compulsive Personality Disorder Communication Communication
Depressive Personality Disorder
Major Depressive Disorder
Past alcohol abuse Emotional Emotional
3
Avoidant, Depressive, and Borderline Expression Expression
Personality Disorder
Emotional Emotional
Major Depressive Disorder
4 Expression Expression
Depressive Personality Disorder
Assertion of Needs Assertion of Needs
Lizarazo et al. (2015) 1 Borderline Personality Disorder Disclosure Disclosure
Bidirectional Bidirectional
2
Communication Communication
3 Disclosure Disclosure
Bidirectional
Lopez (2002) 1 Communication Disclosure
Conflict
Bidirectional
Communication Disclosure
Manduchi & Schoendorff Obsessive Compulsive Personality Disorder
1 Disclosure Emotional
(2012) Borderline Personality Disorder
Emotional Expression
Expression
Conflict
Disclosure
Disclosure
Manos et al. (2009) 1 Emotional
Emotional
Expression
Expression
Disclosure Disclosure
McClafferty (2012) 2 Depression Emotional Emotional
Expression Expression
QUANTITATIVE SYNTHESIS OF FAP 40

Borderline Personality Disorder


Oshiro et al. (2012) 1 No description
Schizophrenia
Posttraumatic Stress Disorder Disclosure Disclosure
Pedersen et al. (2012) 1 Alcohol Dependence Emotional Emotional
Dysthymia Expression Expression
Disclosure Disclosure
Singh & O'Brien (2016) 1 Major Depressive Disorder Emotional Emotional
Expression Expression

2 Major Depressive Disorder Disclosure Disclosure

Emotional Emotional
3 Major Depressive Disorder
Expression Expression
Villas-Bôas et al. (2016) 1 Conflict Conflict

2 Assertion of Needs Assertion of Needs

Xavier et al. (2012) 1 No description

Cattivelli (Unpublished) 1 No description


McCluskey
1 No description
(Unpublished)
QUANTITATIVE SYNTHESIS OF FAP 41

Table 3
Effect Size Calculation per Participant for Graphical Data for Target Behaviors-1
Participant Baseline Treatment
Swanson’
Author Description Points for Points for PND SMTE
sd
(Age) Calculation Calculation
1 Unknown 2 6 1.63 83.33%
Cattivelli
(Unpublished) 2 Unknown 2 6 1.70 100%
3 Unknown 4 6 1.56 100% 100%*

Busch et al., (2009) 1 Female (24/25) 6/9 9/14 0.84 0% 88.89%*


Kanter et al., (2006) 2 Male (42) 6 3 0.14 0% 100%

Kohlenberg, & Tsai


1 Male (35) 8 5 1.19 60% 20%
(1994)

Landes et al., (2013) 1 Female (20) 6 7 2.76 100% 100%*

1 Male (25) 4 10 0.26 60% 20%


Lizarazo et al.,
2 Female (47) 5 14 2.99 92.31% 100%*†
(2015)
3 Female (21) 6 10 0.48 0% 60%

Lopez (2002) 1 Male (31) 3 32 1.22 16.67% 79.16%*

1 Female (46) 4 5 2.04 100% 100%*


Oshiro et al., (2012)
2 Male (18) 4 4 1.56 100% 100%†
Pedersen et al.,
1 Female (41) 3 4 0.40 75%
(2012)
1 Male (72) 2 4 1.30 100%
Singh & O'Brien 2 Male (52) 2 4 1.56 25%
(2016)
3 Female (31) 2 4 1.47 100%
Villas-Bôas et al., 1 Female (38) 5 8 0.18 16.67% 16.67%
(2016) 2 Female (32) 5 7 2.09 85.71% 100%*
1 Female (10) 4 5 2.47 60% 40%
Xavier et al., (2012)
2 Male (7) 4 6 0.05 16.67% 16.67%
Total Participants 21 21 16
Means 1.33 58.70% 69.43%
(0.95 – (41.31 – (50.70 –
Confidence Intervals
1.71) 76.07) 88.17%)
Note. * p < .05, † p < .01
QUANTITATIVE SYNTHESIS OF FAP 42

Table 4
Effect Size Calculation per Participant for Graphical Data for Target Behaviors-2
Participant Baseline Treatment
Author Description Points for Points for Swanson’s d PND SMTE
(Age) Calculation Calculation

Busch et al., (2009) 1 Female (25) 6 9 0.62 77.78% 100%*

1 Gender/Age
2 6 1.71 100%
Unknown
Cattivelli 2 Gender/Age
2 6 1.73 100%
(Unpublished) Unknown
3 Gender/Age
4 6 0.08 100%
Unknown
1 Male (12) 4 11 1.70 100% 100%*
2 Male (11) 7 9 2.06 100% 100%*
Cattivelli et al.,
3 Male (12) 7 8 1.95 100% 100%*
(2012)
4 Male (13) 6 14 2.43 100% 100%*
5 Male (15) 8 18 3.33 100% 100%*

1 Female (44) 6 4 1.99 62.50% 100%†


Landes et al., (2013) 2 Male (28) 10 4 2.50 0% 100%†
3 Male (26) 7 7 2.30 100% 100%*

1 Male (25) 4 10 2.77 80% 80%*


Lizarazo et al.,
2 Female (47) 5 14 1.70 76.92% 84.60%*
(2015)
3 Female (21) 6 10 0.07 20% 20%

1 Female (46) 4 5 2.17 100% 100%*


Oshiro et al., (2012)
2 Male (18) 4 4 1.54 100% 100%†
1 Male (72) 2 4
1.72 100%
Singh & O'Brien 2 Male (52) 2 4 2.47 25%
(2016)
3 Female (31) 2 4 1.17 100%
Villas-Bôas et al., 1 Female (38) 5 8 2.77 66.67% 100%*
(2016) 2 Female (32) 5 7 3.28 100% 57.14%
1 Female (10) 4 5 2.61 80% 20%
Xavier et al., (2012)
2 Male (7) 4 6 0.07 33.33% 33.33%
Total Participants 24 24 18
Means 1.85 79.39% 80.66%
(1.49 – (67.67 – (69.27 –
Confidence Intervals
2.24) 92.51%) 96.85%)
Note. *p < 0.05 †p<0.01
QUANTITATIVE SYNTHESIS OF FAP 43

Table 5
Reliable Change Scores
RCI
Author P Measure
Score
Symptom-Based Measures
Baruch et al., (2009) 1 Beck Depression Inventory-II 2.33*
Busch et al., (2009) 1 Beck Depression Inventory-II 3.96*
Callaghan et al., (2003) 1 Beck Depression Inventory-II 0.93
State-Trait Anxiety Inventory 5.62*
Lopez (2002) 1
Penn State Worry Questionnaire 6.21*
Beck Depression Inventory-II 2.57*
Manduchi & Schoendorff (2012) 1
Worry Domains Questionnaire 4.39*
Patient Health Questionnaire-9 11.00*
McClafferty (2012) 1
Generalized Anxiety Disorder-7 11.88*
McCluskey (Unpublished) 1 Beck Depression Inventory-II 4.66*
Total Participants 7
Mean 5.36*
Confidence Interval (3.09 – 7.62)
Quality of Life Measures
Baruch et al., (2009) 1 Outcome Questionnaire – 45 2.58*
Experience of Self Scale 6.00*
Ferror-Garcia et al., (2012) 1
Acceptance & Action Questionnaire-Spanish 1.98*
Lopez (2002) 1 Acceptance & Action Questionnaire-II 5.13*
Manduchi & Schoendorff (2012) 1 Acceptance & Action Questionnaire-II 3.84*
Behavioral Activation for Depression Scale -0.60
Manos et al., (2009) 1
Dyadic Adjustment Scale -0.37
Work and Social Adjustment Scale 7.16*
McClafferty (2012) 1
CORE Outcome Measure 9.01*
Functional Ideographic Assessment Template – Short Form 2.79*
1
Acceptance & Action Questionnaire-II 1.54
Functional Ideographic Assessment Template – Short Form -2.79*
Singh & O’Brien (2016) 2
Acceptance & Action Questionnaire-II -1.02
Functional Ideographic Assessment Template – Short Form 5.09*
3
Acceptance & Action Questionnaire-II 3.33*
McCluskey (Unpublished) 1 Acceptance & Action Questionnaire-II 3.33*
Total Participants 10
Mean 2.94*
Confidence Interval (1.35 – 4.51)
Note. * denotes statistically reliable RCI score.
QUANTITATIVE SYNTHESIS OF FAP 44

Table 6
Reliability and Standard Deviations Used for Reliable Change Index Calculation
Measure Citation Cronbach’s  SD
BDI-II Steer, Ball, Ranieri, & Beck (1999) 0.93 11.46
STAI-State Virella, Arbona, & Novy (1994) 0.91 9.46
STAI-Trait Virella, Arbona, & Novy (1994) 0.86 8.88
PSWQ Meyer, Miller, Metzger, & Borkovec (1990). 0.97 13.80
Worry Domains Questionnaire McCarthy-Larzelere et al., (2001) 0.94 19.52
PHQ-9 Kroenke, Spitzer, & Williams (2001) 0.89 6.10
GAD-7 Spitzer, Kroenke, Williams, & Löwe (2006). 0.89 3.41
Outcome Questionnaire – 45 Lambert et al., (1996) 0.93 24.14
Experience of Self Scale Kanter, Parker, & Kohlenberg (2001) 0.91 1.33
AAQ-Spanish Barraca (2004) 0.74 8.42
AAQ–II Bond et al., (2011) 0.88 7.97
BADS Kanter, Rusch, Busch, Sedivy (2009) 0.92 20.15
Dyadic Adjustment Scale Spainer (1976) 0.96 28.30
Work and Social Adjustment Scale Mundt, Marks, Shear, & Greist (2002). 0.80 6.40
CORE Outcome Measure Barkham et al., (2001) 0.94 0.75
FIAT-Q-SF Darrow, Callaghan, Bonow, & Follete (2014) 0.85 18.31
Note. BDI-II: Beck Depression Inventory-II
STAI: State-Trait Anxiety Inventory
PSWQ: Penn State Worry Questionnaire
PHQ-9: Patient Health Questionnaire-9
GAD-7: Generalized Anxiety Disorder Assessment-7
AAQ: Acceptance & Action Questionnaire
FIAT-Q-SF: Functional Ideographic Assessment Template-Questionnaire-Short Form
QUANTITATIVE SYNTHESIS OF FAP 45

References retrieved from


initial article search
(n = 179)

Ineligible
Conceptual Reviews (n = 75)
Theoretical articles (n = 56)
Measurement articles (n = 7)

Assessed for eligibility


(n = 41)
Ineligible
Narrative reviews (n = 16)
Therapist-focused outcomes (n = 3)
Group-based outcomes (n = 2)
Data unavailable (n = 2)
Published studies meeting
criteria for review
(n = 18)

Unpublished data located


(n = 2)

Studies included in
quantitative synthesis
(n = 20)

Figure 1.
Flow Chart of Study Selection

You might also like