Professional Documents
Culture Documents
Chapman University
Summer 2021
Rating Scale Critique: Social Skills Improvement System
Description
The Social Skills Improvement System (SSIS) Rating Scales are intended to aid
professionals in screening and classifying students suspected of having significant social skill
deficits. It assists in the development of interventions for those students. Frank M. Gresham
(Ph.D.) and Stephen N. Elliott (Ph.D.) created the SSIS in 2008 to simplify the identification of
students in need of social skill support and their subsequent intervention planning. Both of the
individuals who authored this rating scale and manual identify as white American males.
There are four SSIS Rating Scales forms; a teacher form, a parent form, student (ages
8-12) form, and student (ages 13-28) form. Each form is available electronically or manually,
and both student and parent forms are also available in Spanish. Generally, it takes the rater 15 to
20 minutes to complete a form. The SSIS only contains rating scales in the Spanish and English
languages.
Although the topics and questions differ slightly, each of the four types of the SSIS
Rating Forms follows a similar format. Questions ask students to rate the presence of social skills
and problem behaviors on a Likert scale corresponding to how true (N-not true, L-little true, A- a
lot true, V- very true) and how important (n- not important, i- important, c- critical) a statement
is to their perceived social development. Students in the 8-12 age range do not rank the
importance of each social skill because Gresham and Elliott (2008) did not see this as
developmentally appropriate. Parents and teachers completing the form rate the frequency of
both social skills and problem behaviors using a similar Likert scale format that reveals if they
occur never, seldom, often, or always. Only teachers rating a student using SSIS complete the
section on academic competence. This scale ranges from 1-5, with a 1 indicating the student falls
in the lowest 10% of the class, 2 the lowest 20%, 3 the middle 40%, 4 the highest 20%, and 5 the
highest 10% of their class for math, reading, and learning behaviors (Gresham & Elliott, 2008).
The SSIS Rating Scale is a revised version of the original Social Skills Rating System,
developed by Gresham and Elliott in 1990. The 2008 edition was created to update the perceived
national norms utilized as baselines in the assessment. Additionally, the 2008 edition is
considered a better measure for children ages three to five because the authors included a norm
group of parents and teachers assessing children in this age range (Gresham & Elliott, 2008).
Also, the 2008 version contains four additional sub-scales that include communication,
engagement, bullying, and the Autism Spectrum. Furthermore, the SSIS has a Spanish version
and direct links to interventions. There are also improved psychometric properties and a more
In general, the manual recommends that the SSIS Rating Scales be used when evaluating
students whose ‘problem behaviors’ interfere with the student’s ability to acquire or perform
social skills (Gresham & Elliott, 2008). The manual advises raters to complete the form in a
single session. Therefore it is essential to facilitate a safe and quiet space for the rater to
complete the SSIS. In regards to students, establishing rapport and monitoring the administration
is especially important.
Characteristics
The Social Skills Improvement System (SSIS) rating scale claims to assess three
domains: social skills, problem behaviors, and academic competence. Each domain contains
subdomains to help administrators understand a student’s baseline skills and deficits to make a
plan for improvement. Elliot and Gresham (2008) define social skills as “learned behaviors that
promote positive interactions applied in appropriate social situations'' (p.1). The seven
subdomains tested in the social skills subscale include communication, cooperation, assertion,
with the acquisition or performance of socially skilled behaviors” (Elliot & Gresham, 2008, p. 2).
The teacher and parent forms contain rating questions for the problem behavior subscale within
and autism spectrum behaviors. The teacher form contains an academic competence section in
which they are asked to assess their primary or secondary student on reading and math
performance, motivation, parental support, and general cognitive functioning. Overall the SSIS
takes a multi-rater approach; teachers, parents, and students themselves can document the
frequency and perceived importance of various positive behaviors. To reiterate, only parents and
teachers can rank problem behaviors, and only teachers can rate academic competence.
Test Administration
Gresham and Elliot (2008) estimate each rating scale form takes about fifteen to twenty
minutes to complete. Administering the SSIS does not require any special outside training. Still,
the authors do require administrators to possess at least a bachelor’s degree. They have
administration and interpretation of tests” (Gresham & Elliot, 2008, p. 4). The test can be scored
by hand or electronically using the ASSIST software system; both methods take no more than
twenty minutes, and both the score sheet and manual contain specific details on how to interpret
Gresham and Elliot recommend adhering to five steps when administering the rating
scale. First, establish rapport with the rater. By this, they mean keeping an open and honest
dialogue with the student, teacher, and/or parent about the intended use of results, reason for
evaluation, limits of confidentiality, and any relevant legal issues regarding privileged
student and the rater. Third, they should explain to the raters how to mark responses to ensure
completeness and accuracy. Fourth, the rater should complete the form; it is recommended that
the rater (especially if they are a student) complete the form with the administrator to clear up
any confusion or questions. Lastly, the administrator should review the rating scale to check for
missing or multiple answers to one question to have all the information they need to complete an
The SSIS Rating Forms have many features and characteristics that make them accessible
and easy to use. Primarily, its availability in written or electronic versions ensures easy
administration and accessibility. The forms and scoring process is intended to be as brief as
possible while still providing ample information on important behaviors. Additionally, the
multi-rater approach was designed to incorporate more objective, reliable, and valid data that
provides a more holistic view of the student. Furthermore, the SSIS was revised in 2008 to
ensure that the norms utilized are considered to represent sex-specific and age-based norms. This
revision also came with improved psychometric properties that provide evidence for validity and
internal consistency. Finally, the 2008 revision of the SSIS Rating Forms offers direct links to
The validity and reliability section of the manual provides a variety of measures
demonstrating high correlation coefficients among scales and subscales and between different
raters. Internal consistency reliability presents a .90 coefficient alpha, and interrater reliability
between teachers observing the same student came out to a mean coefficient alpha of .58.
Interrater reliability between parents and close relatives observing the same student resulted in a
median coefficient alpha of .59. Although relatively low compared to the ideal coefficient of .1,
Gresham and Elliott (2008) argue that the discrepancy occurred because only one teacher knew
the student (especially at the primary level) very well, and parents often know the child better
than any other close relative. However, the low alpha coefficients presented indicate that this
rating scale represents low reliability as a multi-rater scale, making it confusing and difficult to
synthesize information gained from different raters into one cohesive behavioral plan.
Overall, raters need to use the results of these scales as a general guideline, not a
diagnostic tool or criteria. Parents and teachers often do not see the same behaviors, making it
ideal for assessing a whole child but also challenging for interpreting the best developmental
plan to improve a child’s behavior. Additionally, the race (white) and gender (male) of the test
creators inevitably creates some bias in the type of social skills deemed culturally and socially
appropriate. Therefore, test administrators should ask themselves if there are any cultural
differences between the student and the demographics of the norm group. Also, the rating scale
has not been updated since 2008, thirteen years ago. An updated version might better assess the
needs and skills of positive social behavior today, especially in a post-pandemic society. It should
also include more than two languages to better reflect the diverse backgrounds of families and
Sample
Gresham and Elliott (2008) sought to create a representative norm sample in their efforts
to update the SSIS. In doing so, the samples were based on the Current Population Survey,
March 2006. The survey's demographics were applied to the three norm groups; preschool (ages
3 to 5) and two school-aged groups (ages 5 to 12 and 13 to 18). Each age group sample was
designed to have equal numbers of males and females and to match the U.S population
concerning the socioeconomic status and geographical region. This nationwide sample spanned
35 states and contained a sample of 4,700 children. The states included were broken down by
region, including five Northeast states, nine North Central states, twelve South states (including
District of Columbia), and nine West states (Gresham & Elliott, 2008). They do not break down
the population of raters per state, but they do list counties, and it appears that states with higher
populations (New York, California, Texas) contain the most amount of participants. Additionally,
their standardization plan called for the inclusion of at least 25 individuals from the following
Sample Critique
sample several times, there is no actual data reported regarding the demographic information. A
map of the United States is included with different points highlighting each geographic location
where data was collected. However, no further information is given regarding any other
Reliability
The SSIS manual reports reliability data by including the internal consistency reliability,
test-retest reliability for parent, teacher, and student forms, interrater reliability for parent and
teacher forms, and standard error of measurement (SEM). Internal consistency aims to assess the
consistency of scores across items within one scale or subscale of the test (Gresham & Elliott,
2008). If the content is homogenous and well-written, it will best represent the behavioral trait or
dimension that the scale of the test aims to assess. Gresham and Elliott (2008) used Cronbach’s
alpha coefficient to evaluate internal consistency for all three domains (social skills, problem
behaviors, and academic competence) for all three test forms (student, parent, and teacher).
Alpha coefficients in the range of 0.8 are considered adequate for screeners, and 0.9 the best for
education and clinical use (Sheperis et al., 2020). Median coefficient alpha reliabilities for the
Social Skills, Problem Behaviors, and Academic Competence scales fall at 0.9 or above.
Subscale coefficients range from 0.7 to 0.85, but Sheperis et al. (2020) argue that an alpha
coefficient of 0.7 is adequate for subscale reliability. Gresham and Elliott (2008) do not include a
breakdown of median alpha coefficients for each domain, but they report average coefficients in
each form of each reliability measure. It would be beneficial if they were more explicit about the
average reliability testing results rather than present these high correlations with little breakdown
of averages.
Test-retest reliability finds the consistency of scores across a brief gap in time for the
“same individual by the same rater using the same form” (Gresham & Elliott, 2008, p. 66). In the
teacher form sample, Gresham and Elliott (2008) included 144 individuals rated twice by the
same teacher for an average of 43 days apart. Subscale reliability coefficients ranged from
0.68-0.86 with an average of .81. Gresham and Elliott (2008) reported a relatively low shift of
average test scores between ratings; the mean standard score changed by less than one point or
one-fifteenth of a standard deviation. On the parent form, 115 individuals were rated by a parent
twice on an average of 61 days apart. The median subscale reliability alpha coefficient for the
parent form came out to 0.8; overall range extended from 0.7-0.87 (Gresham & Elliott, 2008).
For the student form, 127 individuals aged 8-18 rated themselves twice, an average of 66 days
apart (Gresham & Elliott, 2008). Subscale reliability for students presented much lower
coefficients, ranging from 0.59-0.81, and a median reliability coefficient of 0.71. Gresham &
Elliott (2008) account for this lower test-retest reliability among students due to their less
Next, Gresham & Elliott (2008) evaluated interrater reliability, assessing the scores of
two raters rating the same individual using the same form around the same time. They highlight
that the difference between raters comes down to how differently they interpret behavioral
statements, how differently they interpret the intensity of the student’s behaviors, and how
differently the student behaves in front of each different rater (Gresham & Elliott, 2008). The
level of interaction between the two teachers rating the one student tended to have significant
variation, especially for primary students who only had one homeroom teacher. Subscale
reliability coefficients for the 108 teachers that rated 54 students ranged from 0.36-0.69, with a
median of 0.59 (Gresham & Elliott, 2008). For the parent form, Gresham and Elliott (2008) had
both parents (or one parent and one close relative) rate 110 individuals aged 3-18. Reliability
coefficients for subscales on the parent form fell between the 0.5-0.6 range, with standout low
reliability for the assertion subscale (0.37), bullying (0.38), and internalizing behaviors (0.43)
(Gresham & Elliott, 2008). Both parent and teacher samples included individuals from each
demographic category of sex, race/ethnicity, parent’s education, and region of the country
deemed representative of the population. However, no specifications are made about the actual
The test makers include a short section on the standard error of measurement (SEM) or
the estimated difference between the obtained test score and the true test score (the score
obtained without any error). The SEM for teachers assessing social skills of children grouped in
ages 3-18 is 2.6, for problem behaviors 3.0-3.7, and academic competence 2.6-3.6 (Gresham &
Elliott, 2008). On the parent form, SEM ranges from 3.0-3.3 for social skills and 3.2-3.6 for
problem behaviors. For the student form, SEM of social skills came out to 3.7 and 3.3 and 4.5
and 3.3 for problem behaviors. Generally, lower SEMs indicate a higher reliability coefficient;
however, this is relative to the standard deviation (Sheperis et al., 2020). Nevertheless, this scale
contains a standard deviation of 15, which according to calculations done by Sheperis et al.
(2020), indicates the SEM of a highly reliable scale. Therefore, the ranges of standard error of
measurement found by Gresham and Elliott (2008) indicate a relatively low error and high
reliability.
The manual for SSIS also contains a fourth method of reliability testing. Correlation
between the SSIS and other similar measures (parallel stability) measures the reliability
coefficient of the rating scale (the SSIS) to various other similar behavioral assessment scales.
The authors discussed intercorrelation coefficients in the validity section rather than the
reliability section of the manual. Still, the study compared SSIS to the Social Skills Rating
System (SSRS), the Behavior Assessment System for Children, 2nd Edition (BASC-II), the
Scales (HCSBS). Correlation coefficients with the SSRS range from .46-.76, .44-.82 with the
BASC-II, .48-.75 with the Vineland-II, .71-.76 with the SSCSA, and .51-.77 with the HCSBS
Reliability Critique
Gresham and Elliott measure temporal stability with the test-retest method, stability
across two tests with multiple parallel comparisons, stability within the measure by assessing
The test-retest reliability method did not include a t-test to ensure stability of the mean,
but they do state that the mean, standard scores changed by less than one point (Gresham &
Elliott, 2008). The correlation coefficients for all three forms fell between 0.59-.80, which is high
considering each test was administered between 1.5 and 2 months apart; the general guideline for
test-retest reliability dictates .60 as an appropriate coefficient for tests or scales administered
more than one month apart (Sheperis et al., 2020). When measuring the stability of scores across
two tests, Gresham and Elliott (2008) included various samples but found relatively moderate
correlations, with few exceeding the .70 range. For measuring internal consistency, the test
makers used Cronbach’s alpha which is considered the best for evaluating a multiple response
formatted test (Sheperis et al., 2020). Findings for the overall internal consistency of the entire
rating scale came to .90, a value considered reliable for clinical and educational purposes, and
.75-.80 for the internal subscale consistency, also deemed reliable to making educational and
clinical decisions (Sheperis et al., 2020). Lastly, Gresham and Elliott (2008) assessed interrater
reliability for parents and teachers. General guidelines for a robust scale indicate that the
correlation coefficient for interrater consistency among two parents should lie around .60, for
teachers and parents around .50, and for child and parent around .40 (Sheperis et al., 2020).
Gresham and Elliott (2008) did not conduct their inter-rater analysis this way and instead tested
two parents assessing one student and two teachers assessing one student. This led to relatively
low inter-rater consistency and failed to include comparisons that might better capture the
differences between behaviors at home and school and the differences in perspectives between
All in all, the measures done by Gresham and Elliott (2008) demonstrate mostly strong
indications of reliability. The comparisons across similar tests demonstrate only moderate
consistency, and the inter-rater reliability tests are not adequate, nor do they indicate strong
educationally and clinically significant reliability, and the test-retest method shows a high level
of stability over time. Therefore, this test should be considered in the moderate to strong range of
reliability and used among those wanting to rate the behavior of their students who fit the
demographic characteristics of the norm group. This does not mean that administrators of the
SSIS should use the results as a singular justification for any diagnosis or behavior plan. Rather
it should be employed as a tool for helping understand the student’s current status and possible
Validity
The manual includes mention of content, construct, and criterion validity. Gresham and
Elliott (2008) overlap content and construct validity with very little clear evidence as to how they
developed the content for their test. For construct validity, they state that “item development for
the SSIS rating scales was based on a broad survey of the empirical literature on social skills
deficits in special populations,” without actually defining the makeup of these “special
populations” or the literature where they derived their findings (Gresham & Elliott, 2008, p. 75).
They only specifically name the Diagnostic and Statistical Manual IV and the Social Skills
Rating System (SSRS) as two sources for informing the items on the SSIS. The items taken from
these related diagnostic criteria were subjected to DIF, factor-analysis, item-total correlation, and
Additionally, Gresham and Elliott (2008) tested the perceived importance of each test
item by allowing participating parents, students (age 13-18), and teachers to rate the importance
of each Social Skill and Problem Behavior item on a scale from 0-2. Across all forms and
subscales, the mean rating for importance of each item landed at or exceeded 1 (Gresham &
Elliott, 2008). However, this does not present strong construct validity evidence because the
opinion of raters does not reflect theoretical or empirical knowledge that should be used to
develop and assess a construct or domain. It also fails to present content validity because the test
receives subjective and surface-level feedback from respondents rather than from a panel of
Gresham and Elliott (2008) find the intercorrelation coefficients among scales and
subscales of the SSIS to further cover content validity. Gresham and Elliott (2008) categorize
SSIS into three general domains: Social Skills, Problem Behaviors, and Academic Competence.
They intend the social skills domain to contain behaviors that promote positive social
interactions, which likely leaves little room for problem behaviors. Therefore the moderately
negative correlation to the Problem Behaviors scale was anticipated and displayed. For each
form (parent, teacher, and student), the Social Skills and Problem Behaviors alpha
intercorrelation coefficient ranged from -.41 to -.65, demonstrating the moderate negative
correlation that they predicted (Gresham & Elliott, 2008). Furthermore, the correlation between
the Social Skills scale and Academic Competency was moderate and positive, which also reflects
Gresham and Elliot’s (2008) prediction based on their assertion that learning requires social
interaction. The correlation coefficient for academic and social domains were .50 and .53,
reflecting the moderate associations between social skills development and achievement
Lastly, Gresham and Elliott (2008) contain criterion validity by comparing the SSIS to
five other measures: Social Skills Rating System (SSRS), the Behavior Assessment System for
Children, 2nd Edition (BASC-II), the Vineland Adaptive Behavior Scales, Second Edition
and the Home & Community Behavior Scales (HCSBS). Alpha correlation coefficients between
the SSIS and each of these similar scales can be found in the reliability section listed in the above
section. Overall, average coefficients ranged from .46 to .82 at the highest, demonstrating
moderate to strong relationships and validity when comparing the SSIS to other behavioral rating
scales.
Validity Critique
For the most part, the data included on reliability demonstrates that the SSIS measures
the intended domain of social skills and problem behaviors. Alpha correlation coefficients
demonstrate an inverse relationship between variables tested in the Social Skills and Problem
Behaviors domains and moderate correlations between variables tested in the Social Skills and
behaviors that promote positive social interactions, which likely leaves little room for problem
behaviors resulting in an inverse relationship. Furthermore, it is believed that much of learning
involves social interactions, hence the moderate correlation between Academic Competence and
Social Skills. Gresham and Elliott (2008) also assess correlation among items in the same
subscales (item-total correlation). They break down all of these correlations onto charts included
in the validity section in the manual but write that the average coefficient for Social Skills on the
teacher form landed at .60, .50 on the parent form, and .40 on the student form. The average
coefficient for correlation of items on the Problem Behaviors subscale across parent and teacher
forms came out to .40 or above (Gresham & Elliott, 2008). Overall, the correlational relationship
among variables appears only moderate, indicating moderate to low levels of validity when
assessing if the descriptions of items matched what appeared in the data on coefficients.
Although many different forms of validity were explored throughout Elliott and
Gresham’s (2008) manual, there were a couple of notable forms of validity left out. Primarily the
consequences of test-taking and response processes were left out. Each is important
independently as they reinforce the need for the SSIS and the internal processes of the students
(respectively). Gresham and Elliott (2008) did not discuss the implications of administering the
SSIS, and therefore, potential biases are left unexposed. Furthermore, without further knowledge
of the internal processes that students, parents, or teachers are involved in when completing SSIS
Rating Forms, it reduces students to a simple score. Although the SSIS intends to get a holistic
view of the student by applying multiple perspectives, its lack of ability to display thought
Sheperis et al. (2020) argue that validity is a matter of degree, not an absolute. In this
case, SSIS falls in the low to moderate range of validity. Gresham and Elliott (2008) failed to
clearly state who, if any, experts reviewed their test content or what literature they used to derive
the items in each subscale. On the one hand, the intercorrelation coefficients among scales and
subscales demonstrate moderate correlations and reflect their predictive validity, indicating that
the content does measure the intended domains (problem behaviors, social skills, and academic
evidence-based research, and literature they reviewed to create the items for each subscale,
indicating a major miss of criterion validity. Also, there is no discussion on the impact of SSIS
on special populations outside of a short section on differences between male and female
students. This prohibits conversations about whether the test will accurately and appropriately
measure social and behavioral skills for students from diverse populations. It appears that the
test-makers did not take care to ensure that their test would be valid across populations, nor did
Conclusion
The Social Skills Improvement System Rating Scales (SSIS Rating Scales; Gresham &
Elliott, 2008) are a set of norm-referenced rating forms that enable targeted assessment of
individuals and small groups (Gamst-Klaussen et al., 2016). It aids in evaluating social skills,
problem behaviors, and academic competence. The SSIS Rating Scales speak of being
intentional about creating a well-rounded and holistic view of the student. This is accomplished
by involving the parent and teacher in addition to the student to ensure multiple perspectives are
accounted for (Rigney, 2019). There are versions of the parent and student SSIS available in
Spanish in an attempt to provide accessibility. However, there are no other languages that are
made available. This creates a lack of access for those individuals that speak different languages.
When translated to Norwegian in a study done by Gamst-Klaussen et. al (2016), the SSIS
wielded moderate to strong relations to the English version and acceptable internal consistency
across subscales. This data provides the foundation for further translations of the SSIS Rating
Forms, which ultimately creates more accessibility for those whose English is not their primary
language.
Through data analysis, the SSIS has demonstrated multiple forms of reliability and
validity. Crosby (2016) reviewed the SSIS’s psychometric properties and evaluated its reliability
and strength. The author was impressed by the clear identification of behavioral strengths and
weaknesses across raters and settings. They recommend the use of the SSIS to guide specific
results to inform focused instruction and intervention. Furthermore, Crosby (2016) deemed the
psychometric properties of these scales as satisfactory and highlighted the usage of the SSIS as a
Several publications have suggested the SSIS would benefit from an update of the
nor-references utilized in the assessment (Rigney, 2019; Gamst-Kaussen, et al., 2016; Crosby,
2016). Since the SSIS is intended to be a norm-referenced measure, it is important to reiterate the
significance of updating the norms to ensure the representation of the current population.
Although Gresham and Elliott’s (2008) revision prioritized updating the norms, they have yet to
be updated again since 2008. This leaves approximately a 13-year gap in which cultural and
behavioral norms have shifted significantly. Rigney (2019) sought to evaluate the efficacy and
reliability of the SSIS Rating Forms in a more up-to-date setting. Rigney concluded that the SSIS
would benefit from an updated version of normative data utilized. The author also suggested
additional reliability data in the screening and progress monitoring scale and parent, teacher, and
student forms. Including these suggestions would provide more support for the use of the SSIS in
(Rigney, 2019).
Ultimately, the SSIS Rating Scales provide a solid starting point when thinking about a
student’s social skills, problem behaviors, and academic competence. However, it is imperative
to utilize other methods of measurement in order to better evaluate and understand the needs of
students.
References
Crosby, J. W. (2011). Test review of Social Skills Improvement System Rating Scales [Review of
the test Social skills improvement system rating scales, by F. M. Gresham & S. N.
https://doi.org/10.1177/0734282910385806
https://doi.org/10.1177/0734282918781194
Sheperis, Carl J, Drummond, Robert, & Jones, Karyn D. Assessment Procedures for Counselors
Thor Gamst-Klaussen, Lene-Mari P. Rasmussen, Frode Svartdal & Børge Strømgren (2016)
Comparability of the Social Skills Improvement System to the Social Skills Rating
DOI: 10.1080/00313831.2014.971864