Monographs Society Res Child - 2017 - Verdine - III Results Considering The 2 D and 3 D Trials of The Tosa Separately and

III.
RESULTS—CONSIDERING THE 2-D AND 3-D TRIALS OF

THE TOSA SEPARATELY AND TOGETHER
Brian N.Verdine, Roberta Michnick Golinko¡, Kathy Hirsh-Pasek,
and Nora S. Newcombe
This article is part of the issue “Links Between Spatial and

Mathematical Skills Across the Preschool Years” Verdine, Golinkoff,
Hirsh-Pasek, and Newcombe (Issue Authors). For a full listing of articles
in this issue, see: http://onlinelibrary.wiley.com/doi/10.1111/mono.
v82.1/issuetoc.
In this chapter we present data regarding the validity and reliability of the
TOSA, including how children responded differently to the test from age
3 to 4. We also discuss findings from the 2-D and 3-D trials that, combined,
generated the overall TOSA scores. The dimensional scoring schemes used
were intended to allow children to receive partial credit for displaying some
knowledge of how to reproduce the models for the respective trials. At the
beginning of this chapter we present some exploratory and descriptive
analyses of the test items and performance on the scoring dimensions. These
are intended to help understand, at a high level, the specific errors children
make, which aspects of the tasks were easy or difficult, and the properties of
the models that appear to have modulated the challenge. These explorations
are included to contribute to the development of new research questions and
explore strengths and weaknesses in preschoolers’ spatial skills.
Corresponding author: Brian Verdine, University of Delaware, School of Education, 224

Willard Hall, Newark, DE 19716. email: brian.verdine@gmail.com
DOI: 10.1111/mono.12282
# 2017 The Society for Research in Child Development, Inc.
56
15405834, 2017, 1, Downloaded from https://srcd.onlinelibrary.wiley.com/doi/10.1111/mono.12282 by ESKISEHIR OSMANGAZI UNIVERSITY KUTUPHANE VE DOK DAI BASK, Wiley Online Library on [24/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TEST OF SPATIAL ASSEMBLY RESULTS
PERFORMANCE ON THE INDIVIDUAL DIMENSIONS OF THE 2-D AND 3-D

PORTIONS OF THE TOSA
2-D Performance
Scores for the 2-D trials from age 3 reported in Figure 6 are the
proportion of the maximum possible points for each design and scoring
dimension (adjacent pieces, horizontal and vertical direction, and relative position)
arranged in order of difficulty. Rotation errors for each child across the six
designs were, on average, 34.10˚ (SD ¼ 20.36; range ¼ 1.7–103.2).
Based on high performance on the easier items, and children’s relative
success with the adjacent pieces dimension, they understood that each
construction was a cohesive unit and they had the motor skills necessary to
FIGURE 6.—2-D TOSA trial scores at age 3 by dimension and design organized in order of
descending performance. Note: AP ¼ adjacent pieces; HVD ¼ horizontal and vertical direction;
RP ¼ relative position. Error bars are SE’s. Significant post-hoc comparisons with Bonferroni
correction: Adjacent pieces > horizontal and vertical direction and relative position; design 3 > 6,5,4,
and 2; design 1 > 4 and 2; design 6 and 5 > 2.
57
do the task. Scores on the designs varied, but the number of pieces was not the
deciding factor for difficulty; for example, design 2 had only two pieces and
had the lowest mean proportion of points scored (see Figure 6). Designs with
pieces having many sides (e.g., hexagons and pentagons) that had multiple
pieces attached to them tended to be harder. It was quite common for
children to ignore the lean of the tilted designs; constructions whose
component shapes were aligned along a typical horizontal or vertical axis (i.e.,
Designs 1 and 3, see Figure 6) were easier to reproduce than those that did not
(i.e., Designs 2 and 4–6). Children often put pieces on the opposite side
compared to the model, an error for the horizontal and vertical direction
dimension, but often created full mirror images when they did so, preserving
many of their relative position points when they made those errors. Children
first learning to write, from about age 3–7 years, frequently make these types of
mirror reversals for letters (Cornell, 1985). It seems reasonable to assume that
a similar mechanism is at play for the 2-D trials. Like letter reversals, the
majority of the spatial relationships between elements are accurately
maintained in spite of these errors. Results from the horizontal and vertical
direction dimension had performance levels similar to those reported by
Clements, Wilson, and Sarama (2004), indicating that the use of symbolic
designs (e.g., a man) in that study may not have introduced specific additional
challenges (e.g., a dual-representation problem).
3-D Performance
Results of performance on the 3-D trials of the TOSA from Year 1 of the
study, and relations to mathematics performance and language in Year 1, were
previously reported in Verdine, Golinkoff, et al. (2014). All data from Year 3,
all SEM models, and most analyses otherwise focused on the longitudinal
nature of the data are unique to this report. Scores for the 3-D designs and
dimensions are reported in Figure 7.
Unlike the 2-D designs, performance on the 3-D designs appeared to be
much more dependent on the number of pieces, decreasing as the number
and the complexity of the spatial relations between pieces increased. For
example, items 3 and 4 had the same number and type of blocks, but
performance on item 4 was lower. On item 4, participants often failed to
recognize that the blocks on top could share the 2-stud width of the base block
and therefore struggled to abut the blocks on top on their short ends. A
common incorrect solution was to appropriately place the top blocks
perpendicular to the blocks beneath, but to put them next to one another
rather than aligning them across half the width of the block below. As this
common error suggests, it was typical for one of the spatial relations between
blocks to be conserved but for children to fail to reproduce all of the correct
relations simultaneously. Thus, children appeared to struggle with creating
58
FIGURE 7.—Scores on each block design at age 3 by Dimension (from Verdine, Golinkoff,
et al., 2014). Rotation was not scored for item 1 because the 2 2 component piece on the top
is symmetrical.
more than one extrinsic relation between pieces, a finding in line with other
studies of relational complexity (e.g., Halford, Andrews, Dalton, Boag, &
Zielinski, 2002). In spite of increasing complexity as blocks were added, all of
the designs could have been built accurately by simply focusing on the spatial
relation between pairs of individual blocks. Almost all children were
successful on items 1 and 2, which were made of only two blocks. However,
children may have been unable to ignore the complexity of subsequent
structures and segment the constructions into dyads or they may not even
have considered segmenting as a strategy without training.
Changes in TOSA Performance From Ages 3 to 4

A 2 (study year) 3 (coding dimension) repeated-measures ANOVA was
conducted separately for each of the 2-D and 3-D trials to investigate the
relative performance and growth on each of the dimensions as children aged
from 3 to 4. Bonferroni correction was used for all post-hoc comparisons. In
addition, for the purpose of understanding how development on the TOSA
changed over time, we extracted the data for children performing in the
lowest quartile on the 2-D and 3-D trials at age 3. That is, we treated 2-D and
3-D trials separately to see if there was evidence that the lowest quartile
continued to lag behind or caught up to the rest of the group. However,
because this is only a subset of the overall group, these comparisons are mostly
descriptive.
59
The 2-D Trials
For the 2-D trials there was a main effect of age (F[1, 78] ¼ 127.8, p < .001,
hp2 ¼ .62), coding dimension (F[2, 156] ¼ 123.6, p < .001, hp2 ¼ .61), and a
significant age by coding dimension interaction (F[2, 156] ¼ 29.8, p < .001,
hp2 ¼ .28). Not surprisingly, performance at age 4 was better overall
(MAge3 ¼ 67.2%, MAge4 ¼ 89.7%, p < .001) and scores on the adjacent pieces
dimension were higher than the other two dimensions (MAP ¼ 92.1%,
p < .001), which were not significantly different from one another
(MHVD ¼ 71.2%, MRP ¼ 72.1%). As can be seen from the lines for the full
sample in Figure 8 panel A, the significant interaction between age and
dimension was caused by some convergence of the three dimensions at age
4, likely due to some children who reached ceiling, with greater
improvement in the dimensions that started out lower. The lowest quartile
appeared to converge with the remainder of the group for all of the
coded dimensions, but that improvement was global, improvement did not
appear to be particularly strong for any of the dimensions in relation to
one another.
The 3-D Trials

For the 3-D trials there was a main effect of study age (F[1, 78] ¼ 331.0;
p < .001; hp2 ¼ .81), coding dimension (F[2, 156] ¼ 410.2; p < .001; hp2 ¼ .84),
and a significant age by coding dimension interaction (F[2, 156] ¼ 43.3;
p < .001; hp2 ¼ .36). Performance at age 4 was better overall (MAge3 ¼ 46.2%;
MAge4 ¼ 77.5%; p < .001). Performance on the coding dimensions across years
were all significantly different from one another (p < .001 after Bonferroni
correction), with rotation scores (M ¼ 77.1%) higher than vertical location
scores (M ¼ 70.9%) and both scores higher than translation scores
(M ¼ 37.5%).
As can be seen from the lines for the full sample in Figure 8 panel B, the
significant interaction between study year and dimension was caused at least
in part by an aggressive improvement in rotation performance from age 3 to
age 4. Participants went from getting just over half of the points for the rotation
dimension at age 3 to all participants getting 100% correct at age 4. This was a
universal and surprisingly large improvement—even manifested in the lowest
quartile from years 3 to 4—and participants showed far more sensitivity to the
orientation of the blocks at age 4. The improvement is even more striking
considering that the rotation dimension was constrained to have chance
performance of 50%, since the blocks could only be placed aligned or
perpendicular to one another. The rotation errors seen at age 3 typically
involved children placing most blocks in alignment to each other. By age 4,
children take the orientation of the blocks into consideration. Those in the
lowest quartile again appeared to gain the most ground on the dimensions
that were closest to ceiling, but translation was a difficult dimension at age 3
60
FIGURE 8.—Percentage of possible points scored on the 2-D (Panel A) and 3-D (Panel B)
scoring dimensions at age 3 and age 4 for the full sample (solid lines) and participants in the
lowest quartile (dashed lines). Error bars are 95% confidence intervals and are a lighter shade
of gray for the lowest quartile. Note: The lowest quartile data is a subset of the data from the full
sample used in the repeated-measures ANOVA reported in this section (i.e., those with data for
the TOSA from both ages 3 and 4; N ¼ 79). The lowest quartile groups had 21 and 20
participants for the 2-D and 3-D trials, respectively, with differing numbers due to some
participants having tied scores.
and clearly remained difficult for most of the sample at age 4. Translation
errors can be less noticeable to children than either vertical location or
rotation errors because an error can be as small as sliding the blocks one set of
studs in any direction. Therefore, it is perhaps not surprising that errors
remained common.
61
FIGURE 9.—Histograms for the distributions of raw scores from the 2-D and 3-D portions of
the TOSA given at ages 3 and 4.
DESCRIPTION OF SCORES ON THE 2-D AND 3-D PORTIONS OF THE TOSA

AND CREATION OF A COMBINED MEASURE
See Figure 9 for histograms showing the distribution of scores on each

portion of the test. The dimensional scores on the 2-D portion of the TOSA
given in year 1 (age 3) of the study ranged from 8 points to the maximum of 35
points with a mean of 23.33 (SD ¼ 6.65) points (25th percentile ¼ 18,
62
50th ¼ 25, and 75th ¼ 28). By year 2 of the study scores had substantially
improved for most children (M ¼ 31.35; SD ¼ 3.39; range from 21 to 35) such
that the distribution of scores was skewed and had a ceiling effect with many
children scoring near the maximum (25th percentile ¼ 29, 50th ¼ 32, and
75th ¼ 34). Scores on the 3-D portion of the TOSA (excluding ceiling items 1
and 2) given in year 1 of the study had a mean of 17.82 (SD ¼ 5.96) points and
ranged from 7 points to 37 points (maximum possible ¼ 41; 25th percentile
¼ 14, 50th ¼ 16, and 75th ¼ 20). Similar to the 2-D portion, most children
improved in year 2 (M ¼ 27.96; SD ¼ 8.30; range from 12 to 41), although the
ceiling effect observed in the 2-D trials was not as evident (25th percentile
¼ 22, 50th ¼ 28, and 75th ¼ 35).
Children’s responses to the 2-D task appeared, in many ways, to be similar
to how they responded to the 3-D task (Verdine, Golinkoff, et al., 2014). For
example, most children were capable of completing the easiest items and
performance similarly predicted later mathematics ability. However, on the 2-
D trials, children lost a number of points for rotating their designs on the
horizontal and vertical direction dimension relative to the models. For the 3-D
trials, there was no analogue that accounted for the rotation of the design;
children could have presented their copy in any orientation without it
resulting in a penalty. This difference may be one reason children did not
reach ceiling on the easiest 2-D items but did for the easiest 3-D items.
When we started our work there was an a priori expectation that the 2-D
and 3-D trials might provide differential predictions to other skill areas (e.g.,
mathematics) because of differences in the task demands. For example, the
2-D designs had to be nearly aligned with the models to receive full credit,
something that was not a requirement for the 3-D trials because there was
nothing that fixed the workspace or limited the orientation of the overall
design (e.g., the borders of the whiteboard for 2-D trials). Therefore, we
reasoned that 2-D trials might be more likely to capture information about
children’s mental rotation skills. However, as data accumulated across the
longitudinal timeframe, preliminary analyses showed similar patterns of
relations to other measures for both trial types. Because the trials did not
differentially predict performance on other tests and because combining the
trials yielded a more stable measure of early spatial skills, we combined the
trials into a single overall TOSA score using z-scores from each set of trials and
our data analyses primarily used these scores.
These overall scores for the TOSA for year 1 showed a wide range of
analyzable variability with no obvious floor or ceiling effects, had an
approximately normal distribution, and produced a scale-like score that was
more continuous than other spatial measures intended for this age. These
properties were important prerequisites for flexible statistical analysis of the
data produced by the test and provided enough variability to ease the possible
detection of individual differences.
63
Reliability of the TOSA at Ages 3 and 4
Practical considerations prevented administration of the TOSA two times
at age 3 to establish classic test–retest reliability. However, internal consistency
for each administration could be computed. Cronbach’s alpha was assessed
from the total scores for each trial prior to adding the trials together and
z-scoring each portion of the test. Alpha for the dimensional scoring at age 3
was .747 for all trials. Considering the young age of participants and that this
alpha included both 2-D and 3-D trials that may tap slightly different aspects of
spatial ability, the internal consistency of the measure was quite high. At age 4
the alpha of the test was significantly lower (.536), a likely reflection of the
ceiling effects discussed earlier for that age group.
Validity
One major motivation for creating the TOSA was that standardized tests
of spatial skills appropriate for typical 3-year-old children do not provide
enough sensitivity, which our pilot testing verified. Therefore, concurrent
comparisons of performance on the TOSA at age 3 to scores on other
established measures of spatial skill at age 3 were not possible. However, as
seen in Table 6, TOSA scores from age 3 were correlated with performance on
the spatial measures across the full longitudinal timeframe (r ¼.44–.60). This
provided evidence of the TOSA’s predictive validity when given at age 3.
Due to the observed ceiling effect at age 4 for the 2-D trials and more
children near ceiling for the 3-D trials, it would be expected that the TOSA for
children at age 4 would not be as predictive of later abilities as those from the
same children at age 3. Correlations for the z-score combined TOSA scores in
Table 6 appear to bear this out. Ages 3 and 4 TOSA scores had similar
correlations to the other age 4 spatial measures, despite the age 3 TOSA being
administered a year earlier. Likewise, the age 3 TOSA scores had higher
correlations with the age 5 spatial measures than the age 4 TOSA. Thus, the
reliability and validity of the TOSA given at age 3 appears to be quite good. An
increase in the difficulty of items would likely improve the psychometric
properties of the test for 4-year-olds.
SEM Model Predicting Spatial Skills Across Time

Here we modeled the relations between spatial skills at ages 3, 4, and 5 (Model
1; Figure 10). Model 1 is essentially an extension of the correlations reported
above; it primarily allowed us to understand the value of the TOSA as a valid
predictor of later spatial skills measured with multiple well-established or
standardized measures but not accounting for the influence of other variables
or skills. It also provided the information necessary to construct the most basic
understanding of how well earlier spatial skills predict later spatial performance in
preschool. The model had good fit (x2 ¼ 4.66, df ¼ 7, p ¼ .702) and indicated that
64
TABLE 6
PEARSON AND PARTIAL CORRELATIONS OF AGE 3 AND 4 TOSA SCORES WITH THE PPVT AND SPATIAL
MEASURES ACROSS ALL THREE TIME POINTS
Correlations TOSA Degrees of freedom

scores TOSA scores
Age 3 Age 4 Age 3 Age 4

Zero-order Pearson correlations
Age 3
PPVT .400b .509b 96 78
TOSA – .524b – 81
Age 4
TOSA .524b – 81 –
WPPSI .555b .507b 81 81
WJ-III .437b .479b 81 81
VMI .597b .634b 44 44
VMI motor .538b .469b 44 44
Age 5
WJ-III .530b .420b 59 56
CMTT .574b .446b 59 56
Partial correlations controlling
for PPVT scores at age 3
Age 3
TOSA – .406b – 75
Age 4
TOSA .406b – 75 –
WPPSI .453b .352b 75 75
WJ-III .335b .356b 75 75
VMI .484b .462b 39 39
VMI motor .517b .445b 39 39
Age 5
WJ-III .449b .293a 53 53
CMTT .469b .252 53 53
Correlation is significant at the <.05 level (2-tailed).

a
Correlation is significant at the <.01 level (2-tailed).

b
the TOSA accounted for 48% of the variability in age 4 spatial skills tested with the
VMI, WPPSI, and WJ-III (see Table 7 for model path weights). The model showed,
as expected, that age 4 spatial skills were a strong mediator between TOSA scores
and age 5 spatial scores. Within the model, the TOSA had a total standardized
effect of .748 for spatial scores at age 5. Another variant of Model 1 removed the
pathway between ages 4 and 5 spatial skills to assess the amount of variability in age
5 spatial scores accounted for by the TOSA alone in Model 1. The resulting model
did not have good fit (x2 ¼ 25.71; df ¼ 8; p ¼ .001), and as expected it would be
rejected in favor of Model 1, but the R2 value for this Model 1 variant was .56.
65
FIGURE 10.—Model 1: SEM results predicting age 5 spatial skills using the TOSA (age 3)
with age 4 spatial skills as a mediator. See Table 10 for fit indices. Note: Path significant at p < .05
level; Path significant at p < .001 level.
TABLE 7
MODEL 1 (FIGURE 10) PATH WEIGHTS, VARIANCE, AND STATISTICAL SIGNIFICANCE OF INDIVIDUAL
PREDICTORS
Model 1 pathway Std. b b SE p

TOSA (age 3) ! Spatial (age 4) .693 5.947 1.164 <.001
TOSA (age 3) ! Spatial (age 5) .131 0.660 0.992 .506
Spatial (age 4) ! Spatial (age 5) .891 0.525 0.153 <.001
Spatial (age 4) ! WPPSI (age 4) .762 0.484 0.090 <.001
Spatial (age 4) ! VMI (age 4) .748 0.299 0.065 <.001
Spatial (age 4) ! WJ-III (age 4) .661 1.000
Spatial (age 5) ! WJ-III (age 5) .635 1.508 0.324 <.001
Spatial (age 5) ! CMTT (age 5) .777 1.000
66
TABLE 8
SEM MODEL FIT INDICES AND R2 VALUES FOR THE VARIABLES PREDICTED BY THE MODEL
Spatial Math
(age 5) (age 5)
Model Figure x2 (df) p RMSEA CFI TLI NFI R2 R2
Model 1: TOSA 10 4.66 (7) .702 <.001 1.00 1.05 0.97 .97 n/a
validity analysis
Model 2: 11 1.23 (4) .873 <.001 1.00 1.13 0.99 .74 n/a
predicting only
age 5 spatial
skills
Model 3: 12 8.05 (4) .090 .100 0.98 0.87 0.97 n/a .61
predicting only
age 5 math
skills
Unified model variants: Predicting age 5 spatial and mathematical skills
Model 4: 13 20.62 (14) .112 .068 0.98 0.92 0.94 .63 .74
Spatial!Matha
No spatial- n/a 38.47 (15) .001 .124 0.92 0.75 0.88 .69 56
math pathb
Math!Spatialc n/a 30.39 (14) .007 .108 0.94 0.81 0.91 .78 .53
Note: RMSEA ¼ root mean square error of approximation; CFI ¼ comparative fit index; TLI ¼ Tucker–Lewis
index; NFI ¼ normed fit index. In SEM a chi-square value close to zero and a p-value greater than .05 indicate
little difference between the expected and observed covariance matrices, thus small chi-square values and
large associated p-values are an indicator of good model fit. RMSEA values range from zero to one with a
smaller RMSEA value indicating better model fit and a value of .06 or less typically being considered
adequate. CFI and TLI range from zero to one with larger values indicating better model fit. CFI, NFI, and
TLI values of .90 or greater are typically considered acceptable cut-offs for balancing type 1 and type 2 error
rates for smaller sample sizes (e.g., Hu & Bentler, 1999).
a
This is the best-fitting of the variants of the unified model predicting both spatial and mathematical skills at
age 5. The other variants do not have good fit.
b
This model was identical to Model 4 (Figure 13) except the pathway between spatial and mathematical
ability was completely removed.
c
This model was identical to Model 4 (Figure 13) except the pathway between spatial and mathematical
ability was reversed to go from mathematical to spatial ability.
Thus, the TOSA at age 3 alone accounted for 56% of the variability in the age 5
spatial latent variable. See Table 8 for additional fit indices and a summary of all
models in the results. The relations between the TOSA and later spatial and
mathematical skills is further explored through models in Chapter 4, which
incorporated other variables that captured individual differences.
Discriminant Validity
The PPVT was included in the age 3 assessments to (a) capture children’s
language skills, allowing them to be factored out of the relations between
other variables; and (b) to serve as a measure of discriminant validity for the
TOSA. If the TOSA was a good measure of spatial skill it should be more
highly correlated with other spatial measures than with the vocabulary
67
measure. The PPVT was significantly correlated with scores on the TOSA at
age 3, as seen in Table 6 (r ¼ .40). This is not surprising considering prior
research showing relations of block building with reading performance
(Hanline et al., 2010), spatial skills with PPVT scores (e.g., Casey et al., 2015),
and PPVT scores with general intelligence measures (e.g., Hodapp & Gerken,
1999). However, the correlation was not larger than correlations of the TOSA
with the spatial measures given at ages 4 and 5 (r’s ¼ .44–.60).
Further, as seen in the partial correlations presented in Table 6,
correlations between the age 3 TOSA scores and scores on the spatial tests
given at ages 4 and 5 remained significant after controlling for PPVT scores.
Thus, although PPVT scores were related to scores on the TOSA, the
variability in the TOSA explained by vocabulary skills (highly related to
general intelligence in young children—e.g., Childers & Durham, 1994) did
not explain the majority of the observed relations between age 3 TOSA scores
and the other spatial tests given at ages 4 and 5. That is, the evidence indicates
that the TOSA is not just another language or intelligence test.
Alternative Scoring Strategies

Publications generated from the beginning of the longitudinal study
(e.g., Verdine, Golinkoff, et al., 2014) coded what was referred to as “match
scores” for each item. This scoring system coded whether each item was 100%
correct (receiving 1 point) or not (receiving 0 points). This scoring system is
correlated with the TOSA dimensional scores described and reported above
and is predictive of other measures within year 1 of the study. However,
correlations between other variables and the match scores were attenuated by
comparison. For example, correlations between the TOSA at age 3 and the
Woodcock–Johnson Spatial Relations Subtest at age 4 were .437 for the
dimensional scoring and .398 for the match scoring system and correlations
with the WPPSI Block Design Subtest at age 4 were .555 and .426, respectively.
These differences appear to be due, at least in part, to the match scoring
sharing many of the properties that make the few existing standardized
measures less sensitive for this age. First, match scores had a limited range
(0–12 as opposed to 0–76 for dimensional scoring) that resulted in
distributions that were less normal and had less variability. Second, children
received credit only when they were 100% correct, even if they displayed some
skill in completing the trials. This likely obscured relatively small but
important differences in spatial competencies at this age, competencies that
children may use to bootstrap future spatial learning. Because the
dimensional scoring appears to be an overall more appropriate method of
assessing spatial skills for 3-year-olds, we focused on that scoring scheme
throughout this report. Despite the dimensional coding introducing some
complexity in scoring the test, the partial credit that it awards appears to have
68
been integral to using this assessment with such young children and justifies
many of our concerns with the sensitivity of existing standardized measures
for children this age. Certainly researchers have grappled with a variety of ways
to capture block building skill (e.g., Brosnan, 1998; Caldera et al., 1999; Casey,
Andrews, et al., 2008; Reifel & Greenfield, 1982; Wolfgang et al., 2001). Our
results do beg the question of whether prior work on block building that
coded performance on trials in an “all or none” fashion similar to the
standardized tests (e.g., Nath & Sz€ ucs, 2014) might have had even more
compelling results with a more nuanced coding of the data. Perhaps partial
credit scoring methods are especially preferable for younger children, who
are less likely to tolerate more trials and may need more opportunities to
display competency than are afforded by a less granular coding scheme.
Whatever the case, our data indicate that the granular coding scheme for both
2-D and 3-D trials is likely a source of the strength of the TOSA as a spatial
assessment for very young children.
SUMMARY AND CONCLUSIONS ABOUT THE VALIDITY AND RELIABILITY OF

THE TOSA AT AGES 3 AND 4
Our first research question was whether we can we reliably measure

spatial skills in 3-year-olds. The short answer to this question is yes: the TOSA
was a reliable and valid measure of spatial skill in children this young. The raw
scores for each portion of the TOSA administered to children at age 3
followed relatively normal distributions with no floor or ceiling effects,
consistent with the measure having captured the full range of variability in a
mixed-SES sample. The combined score for the overall TOSA also produced a
scalar score that was less discrete and provided more variability for analysis
than existing measures. These properties give the overall TOSA clear
advantages over existing measures for 3-year-olds. Although strong concur-
rent spatial measures were unavailable for comparison at age 3, one
motivation for creating this test to begin with, the test did have predictive
validity to other tests of spatial skills one and two years later with medium-sized
and significant correlations with established spatial measures. Language
scores, used as a covariate in partial correlations, had little effect on the
relation between age 3 TOSA scores and the later spatial tests, which indicates
that those relations are not merely explained by intelligence factors associated
with language. Models reported in the next chapter only strengthen
arguments for the validity of this test for use with 3-year-olds.
Additionally, the TOSA administered at age 3 had good internal
reliability (a ¼ .747). Especially considering the difficulty in testing children
so young, the TOSA appears to be an accessible and appropriate spatial
measure for 3-year-olds with acceptable psychometric properties. Therefore,
69
this measure will be appropriate for fulfilling the remaining goals of this
longitudinal project: to explore the relation between early and later spatial
skills and the relation between spatial and mathematics performance in
preschoolers.
The psychometric properties of the TOSA as a spatial measure for
children at age 4 were not as good as those at age 3, likely because many
children performed very well on the 2-D trials. To make the test more effective
for 4-year-olds, future development could focus on the creation of more
difficult trials, particularly for the 2-D portion. Among the strategies for
increasing the difficulty of the TOSA are to (a) include items with a larger
number of shapes or blocks in each design; (b) include shapes and blocks not
used in the model from which children have to choose; and (c) give the test
with a time limit. Another possibility is to make the test work for 2-year-olds.
Recently collected pilot data indicate that the 2-D trials of the TOSA can be
successfully administered to children at least as young as 28–32 months.
70

Monographs Society Res Child - 2017 - Verdine - III Results Considering The 2 D and 3 D Trials of The Tosa Separately and

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Monographs Society Res Child - 2017 - Verdine - III Results Considering The 2 D and 3 D Trials of The Tosa Separately and

Uploaded by

Copyright:

Available Formats

III.

RESULTS—CONSIDERING THE 2-D AND 3-D TRIALS OF

This article is part of the issue “Links Between Spatial and

Corresponding author: Brian Verdine, University of Delaware, School of Education, 224

PERFORMANCE ON THE INDIVIDUAL DIMENSIONS OF THE 2-D AND 3-D

Changes in TOSA Performance From Ages 3 to 4

The 3-D Trials

DESCRIPTION OF SCORES ON THE 2-D AND 3-D PORTIONS OF THE TOSA

See Figure 9 for histograms showing the distribution of scores on each

SEM Model Predicting Spatial Skills Across Time

Correlations TOSA Degrees of freedom

Age 3 Age 4 Age 3 Age 4

Correlation is significant at the <.05 level (2-tailed).

Correlation is significant at the <.01 level (2-tailed).

Model 1 pathway Std. b b SE p

Alternative Scoring Strategies

SUMMARY AND CONCLUSIONS ABOUT THE VALIDITY AND RELIABILITY OF

Our first research question was whether we can we reliably measure

You might also like