You are on page 1of 15

International Journal of Educational Research 62 (2013) 21–35

Contents lists available at SciVerse ScienceDirect

International Journal of Educational Research


journal homepage: www.elsevier.com/locate/ijedures

The role of reading comprehension in maths achievement


growth: Investigating the magnitude and mechanism of the
mediating effect on maths achievement in Australian
classrooms
Alvin Vista *
Melbourne Graduate School of Education, The University of Melbourne, L8 100 Leicester St., Parkville, VIC 3010, Australia

A R T I C L E I N F O A B S T R A C T

Article history: This study examined the role of reading comprehension skill in the relationship between
Received 13 May 2013 problem solving ability and growth in maths achievement. Within this analysis
Received in revised form 7 June 2013 framework, group differences based on language background were also examined. The
Accepted 21 June 2013 participants for this study are government school students (N = 5886) in grades 3–8 in
Available online 27 July 2013 Victoria, Australia.
Results showed no evidence of moderation by language background, implying that
Keywords: language background does not have an effect on how RC skill mediates the relationship
Mediation
between PS ability and growth in maths achievement.
Reading comprehension
Partial mediation is confirmed in more focussed tests of mediation using regression
Group differences
Mathematics achievement
analysis. Two datasets using independent measures provide corroborating evidence that
Structural equations modelling RC skill may be partially mediating the relationship between PS ability and growth in
Language background maths. In addition, findings show that these mechanisms hold uniformly regardless of
language background, and that with no evidence of group differences with respect to the
partial mediation model, the results have better generalisability to the student population.
These findings have important implications for future system-wide implementation of
large scale interventions on Australia’s linguistically diverse classrooms.
ß 2013 Elsevier Ltd. All rights reserved.

1. Introduction

1.1. Language and mathematics

The main aim of this paper is to investigate the mechanism of the mediating effect of reading comprehension (RC) on the
relationship between reasoning ability and growth in maths achievement, and develop a model to assess the magnitude of
this effect, all within the context of linguistically diverse Australian classrooms. The effect of English RC skill on the
relationship between maths ability and other cognitive abilities (such as fluid intelligence) has not been studied using large
Australian datasets. There had been international studies that focused on how phonological decoding, processing speed,
concept formation, and other language skills influence maths ability (Fuchs et al., 2006; Seethaler & Fuchs, 2006). Hart,
Petrill, Thompson, and Plomin (2009) found that certain maths skills that are dependent on school learning have greater
shared environmental influences (environmental factors that account for the similarities), while greater environmental
overlap (i.e., variability is accounted for by environmental rather than innate factors) between reading and maths ability was

* Tel.: +61 0413561957.


E-mail address: vistaa@unimelb.edu.au.

0883-0355/$ – see front matter ß 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ijer.2013.06.009
22 A. Vista / International Journal of Educational Research 62 (2013) 21–35

found for skills that involve reading, such as problem solving (PS) ability. Because their study involved twins, shared
environmental influences can be interpreted as variability between families and across schools. Thus, it appears that
performance on maths items is not uniformly dependent on general cognitive abilities but rather influenced by either
genetic or environmental factors depending on specific maths skills involved (Hart et al., 2009). This implies that some maths
items are more influenced by the variability in skills due to environment (e.g., differences in school and background) than
general cognitive ability, suggesting that the interrelationship between PS and reading skills is significant whether
accounted for by genetic or environmental factors.
This interrelationship between PS and reading skills is supported by results from Vilenius-Tuohimaa, Aunola, and Nurmi
(2008). Their study looked at how gender (genetic) and parental education (environmental) influence RC and mathematical
PS ability; with results showing that even if both factors are controlled, reading and PS skills are still significantly
interrelated. Vilenius-Tuohimaa et al. (2008) suggest that the covariance between RC and mathematical PS could imply that
a reasoning component common to both may be at play. Unfortunately, it is not clear from their path models the role that RC
skills play in predicting PS performance – whether RC moderates or mediates the relationship between mathematical PS and
either genetic or environmental factors.
Given that important language skills (e.g., decoding and comprehension, word efficiency, phonological processing) either
directly predict maths performance (Fuchs et al., 2006; Seethaler & Fuchs, 2006) or at least covary with it (Hart et al., 2009;
Vilenius-Tuohimaa et al., 2008), the language used to measure maths performance has direct implications for non-native
speakers of this language. This effect of test language on maths performance of non-native speakers may not be simple (or
uniform) due to the differences in language loadings depending on types of maths skills being measured (for example,
arithmetic calculation may have less language loads compared with more complex word problems) (see for example
findings in Fuchs et al., 2005, 2006). But regardless of differential effects that depend on skills being measured and type of
test items, it can be argued that skills in the test language would have some influence on the performance in these tests, just
as reading skills overlap with general cognitive ability in its loading on maths ability (Hart et al., 2009).
The effect of language background as a moderator of language proficiency is apparent in studies that focus on
performance of two linguistic groups in tests that are predominantly in the language of only one of the groups. A study by
Alderman (1981) examines the effect of language proficiency in the performance of students whose first language is either
English or Spanish on a test of academic aptitude and findings showed that language proficiency in a second language
moderates the relationship between performance on an aptitude test given in a second language (English, SAT) and
performance on a similar aptitude test given in a first language (Spanish, PAA). This shows that the language used in a test is
an additional and significant hurdle for students in order to demonstrate the abilities that such particular test aims to
measure, regardless of whether language ability itself is relevant to the abilities being measured. For example, in Alderman’s
(1981) study, the moderating effect of English language proficiency is significant whether the outcome variable being
measured is aptitude scores in verbal ability (SAT-verbal) or maths ability (SAT-maths) and thus ‘‘proficiency in a second
language has theoretical and empirical relevance for interpreting the results of tests given in that language’’ (Alderman,
1981, p. 17). This is in line with valid concerns that, when applied to non-native speakers, certain tests invariably become a
test of language proficiency (Solano-Flores & Trumbull, 2003). In this context, the test language becomes a factor that
contributes to measurement error while being in itself irrelevant to the construct being measured, hence detracting from the
valid interpretation of test results (Solano-Flores & Trumbull, 2003). Group difference between NESB and ESB students
cannot be solely attributed to language factors because other sociocultural factors may come into play (Carstairs, Myors,
Shores, & Fogarty, 2006). Nevertheless, there is evidence that from a testing perspective, verbal load is a major determinant
of group differences – even when the groups may not be based strictly on language backgrounds or language skills.
It is not difficult to hypothesise a link between language proficiency and performance in maths. Yet research on this area,
particularly on the relationship between language proficiency and trends in maths achievement, is rare (Tate, 1997). Tate’s
review, spanning some 15 years, focused on trends in maths achievement in the US and found only one study that looked at
language proficiency as predictor. In Australia, there is similar scarcity in studies on the predictors of maths achievement and
even more so on the predictors of maths achievement growth (Hemmings, Grootenboer, & Kay, 2011). The comparative lack
of research activity on the effects of language proficiency on maths growth is worsened by confusion on the conceptual
definition of language proficiency in its use as a demographic variable for research on group differences (Tate, 1997), where
some terms are used more interchangeably,1 while others less so. Beyond the labelling issues, the construct attributed to a
grouping variable based on language proficiency needs to be defined clearly to minimise the impact of construct-irrelevant
variance in the analyses later on (Hauger & Sireci, 2008). It is therefore important to examine the effects of RC skill separately
for NESB and ESB students so that language proficiency is not confounded with language background.

1.2. General reasoning ability and mathematics achievement

Although research on the interrelation of reading and PS skills using Australian data is relatively rare, Carstairs et al.
(2006) found differences between NESB and ESB in tests of fluid intelligence (MUNNS, based on a battery of intelligence tests,

1
In this study, the term LBOTE is maintained as a demographic status variable as defined by the Australian Bureau of Statistics. ESB and NESB are terms
used as the group names resulting from the dichotomy based on LBOTE status.
A. Vista / International Journal of Educational Research 62 (2013) 21–35 23

demographically scaled, and using Australian norms). Carstairs et al. (2006) interpreted their findings as evidence that both
language and cultural components impact one’s performance on verbal and non-verbal aspects of intelligence tests, rather
than as differences in cognitive ability between ESB and NESB groups. Looking at how fluid intelligence predicts maths
achievement (Kyttälä & Lehto, 2008; Primi, Ferrao, & Almeida, 2010), this implies that language components may have a
mediating role in the link between fluid intelligence or its analogue, PS ability, and maths achievement.
General reasoning ability takes into account both inductive and deductive types of reasoning, as well as divergent and
convergent thinking skills. While these skills are essential in maths, tests of general reasoning ability can be, and in fact are,
usually constructed to be independent of specific maths knowledge. PS tests measure general reasoning ability and creative
thinking skills – domains that do not necessarily need any specialised knowledge in maths. However, it is undeniable that
one’s performance on a general PS test will be correlated to a high degree with performance on a maths ability test.
A number of factors are involved in the relationship between general reasoning ability and maths performance. For example,
performance on measures that test visuospatial working memory does correlate with maths achievement, but the effect is
mediated by fluid intelligence (Kyttälä & Lehto, 2008). There is significant neurobiological support for the link between fluid
intelligence and performance in high-level cognitive tasks (Gray, Chabris, & Braver, 2003). Fluid intelligence and working
memory predict multi-tasking performance (König, Bühner, & Mürling, 2005), which is crucial to maths achievements.
Benchmarks for measuring maths achievement in the levels of schooling that are within the context of this study are
based on standardised and national measures of maths in Australia where national testing is comparatively recent. The main
national measure of maths ability is the numeracy test within a much larger system of national testing called the National
Assessment Programme – Literacy and Numeracy (NAPLAN). Beginning in 2008, NAPLAN is administered annually for
students in years 3, 5, 7, and 9 as Australia’s assessment programme that evaluates educational outcomes at the national
level (Ministerial Council for Education, Early Childhood Development and Youth Affairs, 2009). It has been designed to focus
less on content in an attempt to discourage inappropriate test preparation. The developers intended the NAPLAN to assess a
more general level of student learning outcomes (ACARA, 2011, n.p.).
The measurement of RC skill is part of literacy assessment, which involves both reading and writing as the two main
components of literacy. From an assessment perspective, literacy is defined in the curriculum as consisting of two main
processes: comprehension and composition of texts (ACARA, 2012). The comprehension process is further categorised into
listening, reading, and viewing. RC is the component and more delimited construct that is used in this study. As a construct,
RC can be defined as a ‘‘conceptualisation of skills and knowledge that comprise the ability to make meaning of text’’ (Morsy,
Kieffer, & Snow, 2010, p. 3).
Similar to the issue of measuring maths achievement and RC skill, measuring general reasoning ability has its own
challenges. However, because reasoning ability is a more abstract skill, at least in the curricular and school-level contexts, the
challenges are more complex than those encountered in measuring maths achievement. To delineate the measurement of
general reasoning ability in the curricular and school-level contexts, we need to frame it as a measure of PS ability. By doing
this, we limit the scope of the construct that we need to measure and are able to take advantage of an existing framework
that specifies the curriculum standards.
With the trend of Australian classrooms becoming more diverse, now is the most opportune time to look at large data in
order to describe this dynamic of how a general ability affects a specific academic achievement. This study capitalises on the
rare opportunity to have a large and representative assessment sample data from government schools that include measures
of PS ability. Nationwide tests have numeracy and literacy components, but the inclusion of a reasoning ability component is
relatively uncommon. This is important because the inclusion of this reasoning component in this study allows
measurement of a construct that affects both numeracy and literacy components of academic performance while being
relatively independent of schooling.
This is an important area of group analysis because Australia has an increasing student population from non-English
language backgrounds. Since 2000, the net overseas migration gain began to account for the majority of population growth in
Victoria, and contributes 59% to the total population growth in Australia. There are also important changes in the countries of
origin of the new overseas migrants. Migrants from non-English speaking backgrounds are increasing, with almost half
coming from Asia (China and India, in particular, are the top two countries of origin). This shift in the country of origin has
important implications for the use and level of proficiency in English among these new migrants. Recent statistical data have
shown that 49% of first-generation Australians now speak a language other than English at home, and up to 20% for second-
generation Australians (Australian Bureau of Statistics, 2006).
In this context, differential effect of language background is one of the main focuses of this research so that if group
differences are found to exist, we can investigate the underlying reasons and address their implications for teaching and policy.

2. Methods

2.1. Participants

Participants in this study are government school students who participate in the ALP testing every October and March.
The ALP tests are designed to be administered to grades 3–10 students across the participating Department of Education and
Early Childhood Development (DEECD) regions in Victoria. Testing commenced on October 2009 and was administered
twice a year (March and October) every year thereafter. However, this study is focused only on grade levels 3–8, and only for
24 A. Vista / International Journal of Educational Research 62 (2013) 21–35

Table 1
Sample size by year level and language background.

Language background Year level in 2010 Total

3 4 5 6 7 8

ESB 650 719 664 1108 613 610 4364


NESB 285 305 265 381 191 95 1522
Total 935 1024 929 1489 804 705 5886

Table 2
School-level demographic data.

Demographic variable Label Frequency Percent

School type Pri/sec combined 112 1.90


Primary only 4437 75.38
Secondary only 1136 19.30
Missing 201 3.41
Total 5886 100.00

participants from 2010 to 2011 test administrations. The participants for the ALP project within the scope of this study are 61
government schools in 6 DEECD regions in Victoria involving around 5886 students of diverse linguistic backgrounds and a
wide range of English language proficiency. Table 1 reports the overall sample broken down by year level (in 2010) and
language background.

2.1.1. Demographics
The demographic data are summarised in Table 2 for schools and Table 3 for participants. Only language background
among the demographic variables presented here is used in the analyses. The other demographic variables are reported here
to provide an overview of the characteristics of the sample. Demographic data for Victoria and the country were also
provided in some of the person-level variables to allow for comparison between sample and population characteristics (see
Table 3). In particular, sample characteristics are reasonably close to the state and national demographics on parental
education level and occupation. This has important implications for the generalisability of the results since demographics,
parental education level in particular (Vista & Grantham, 2010), can have significant influence over participant ability level,
especially reasoning ability. Gender and language background distributions in the sample are also reasonably close to state
and national distributions (Department of Immigration and Citizenship, 2008).

2.1.2. ESB/NESB designation/classification


The main grouping variable in this study is language background. Only a dichotomous variable was used thus resulting in
two groups (ESB and NESB). In terminology, the term NESB is taken in this study to be equivalent to the term preferred by the
ABS, language background other than English (LBOTE). The definition of group membership used here is adopted from the
conventional definition endorsed by the Ministerial Council on Education, Employment, Training and Youth Affairs
(MCEETYA) and used by DEECD:
Language background other than English (LBOTE): These persons were either born in a non-English speaking country,
or in Australia with one or both parents born in a non-English speaking country, or are Indigenous students for whom
English is a second or other language (Ministerial Council for Education, Early Childhood Development and Youth
Affairs, 1997, p. 78).
This definition is broader than the one used by the ABS to identify those with LBOTE status. ABS defines LBOTE status in
more specific terms as defined in its Standards for Statistics on Cultural and Language Diversity and uses 4 indicators to
determine LBOTE status: country of birth of student; country of birth of father; country of birth of mother; and main
language other than English spoken at home. Nevertheless, the broader definition endorsed by MCEETYA still remains
conceptually compatible with the indicators of language diversity that ABS uses to collect data on language background and
is sufficient for educational research (Australian Council for Educational Research, 2000).
The data for this grouping variable were provided by the Data, Outcomes and Evaluation Division of the DEECD as part of
the data sharing agreements with project partners of the ALP. The data from DEECD indicated LBOTE status dichotomously
(yes/no) and were recoded for this study into a similar dichotomous variable (ESB/NESB).

2.2. Instrument

The main instruments for this study are the ALP tests composed of three strand measuring respective constructs –
Numeracy (Num), Literacy (RC), and Problem Solving (PS). They are administered twice a year (March and October) and are
reported on a logit scale. Each test is a multiple choice type that is administered online using a platform based on Adobe
A. Vista / International Journal of Educational Research 62 (2013) 21–35 25

Table 3
Participant-level demographic data.

Demographic variable Label Frequency Percent Victoria (%)a Australia (%)a

Mother school education Not stated/no data available 1482 25.18 26.16 26.77
Year 9 or equivalent or below 392 6.66 5.62 5.54
Year 10 or equivalent 676 11.48 12.26 18.35
Year 11 or equivalent 746 12.67 10.99 7.92
Year 12 or equivalent 2590 44.00 36.73 34.67
Total 5886 100.00

Father school education Not stated/no data available 2210 37.55 27.99 28.77
Year 9 or equivalent or below 379 6.44 6.07 5.65
Year 10 or equivalent 627 10.65 13.35 18.29
Year 11 or equivalent 603 10.24 11.26 8.09
Year 12 or equivalent 2067 35.12 34.28 33.03
Total 5886 100.00

Father occupation Senior management in large business 700 11.89 8.44a 8.05a
organisation, government administration and
defence, and qualified professionals
Other business managers arts/media/ 797 13.54 9.18a 8.68a
sportspersons and associate professionals
Tradesmen/women clerks and skilled office 789 13.40 18.31a 18.13a
sales and service staff
Machine operators, hospitality staff assistants, 822 13.97 13.62a 14.25a
labourers and related workers
Not in paid work in last 12 months 710 12.06 – –
Not stated or unknown 817 13.88
Missing 1251 21.25
Total 5886 100.00

Mother occupation Senior management in large business 632 10.74 4.06a 4.11a
organisation, government administration and
defence, and qualified professionals
Other business managers arts/media/ 674 11.45 9.95a 9.51a
sportspersons and associate professionals
Tradesmen/women clerks and skilled office 706 11.99 17.67a 17.85a
sales and service staff
Machine operators, hospitality staff assistants, 791 13.44 9.34a 9.57a
labourers and related workers
Not in paid work in last 12 months 1739 29.54 – –
Not stated or unknown 93 1.58
Missing 1251 21.25
Total 5886 100.00

Sex Male 3578 60.79 53.96b 53.60b


Female 2308 39.21 46.04b 46.40b
Total 5886 100.00

Language background ESB 4364 74.14 79.60c 84.20c


NESB 1522 25.86 20.40c 15.80c
Total 5886 100.00
a
Census of population and housing (Australian Bureau of Statistics, 2006).
b
Based on full-time students in year 8 and below.
c
Source: Department of Immigration and Citizenship (2008).

FlashTM. The online system is designed to score and process the responses automatically using the psychometric information
and item parameters contained in the item bank. The items for these tests are mapped on a uniform latent variable scale that
fits into a single parameter IRT (Rasch) model, thus providing a way to estimate the item difficulty in a scale that is uniform
and has an arbitrary point of origin across test takers (Kelderman, 1988). Due to this arbitrary nature of the scale, it can be set
such that item difficulties are comparable across samples and test subgroups (Kelderman, 1988). This is done through
vertical scale calibration and test equating (see Appendix for details).
The PS subtest was developed through an extensive process spanning 10 months (from January to October 2009). Initial
item pool was taken from the established item databases from the General Ability, English and Mathematics (GEM) and
Creative Problem Solving (CPS) programmes (Assessment Research Centre, 2009a, 2009b). These items have been rotated out
of the GEM and CPS tests after having been administered in large-scale settings in Victoria, thus providing robust
psychometric data regarding the item difficulty and discrimination values.
Although these items contain verbal loads, they only require basic English proficiency. As the main predictor in the study
design, the measure of PS ability (and thus the scope-limited measure of reasoning skills) must not be systematically biased
to one of the groups in the study. In other words, the measure must be designed so that it should not present undue
26 A. Vista / International Journal of Educational Research 62 (2013) 21–35

Table 4
Bivariate and partial correlations for ALP and NAPLAN numeracy tests.

Control variables ALP test NAPLAN Scaled Score Num

2009 2010 2011

None (zero order correlations) 2009_1 Num Correlation .571 .483 .594
df 625 389 348
2010_1 Num Correlation .647 .723 .736
df 3492 3818 2482
2011_1 Num Correlation .569 .641 .744
df 2608 5597 6264
2010_1 PS Correlation .700 .703 .733
df 1434 1306 734

2010_1 PS 2009_1 Num Correlation .396 .256 .429


df 343 343 343
2010_1 Num Correlation .347 .487 .496
df 1433 1305 733
2011_1 Num Correlation .318 .436 .608
df 934 934 733

All ps < .01.

disadvantage to either the ESB or NESB groups. To address the issue of developing a PS measure that is unbiased to study
groups, the study design incorporates statistical methods to check on whether the instrument exhibits systematic bias for
any of the groups. This is done through a DIF analysis on the PS instrument during the midpoint of the study in order to
investigate the test response function of the instrument with respect to the groups as well as improve the instrument for the
subsequent test administrations. In the DIF analysis conducted prior to the analytical process in this paper, no evidence of
substantial DIF were found (see Vista, 2012).
The ALP programme initially started with a focus on measuring just the numeracy and RC skills of students in the Catholic
education sector, and later expanding to government schools. This cluster of projects is an extension of the Literacy
Assessment Project, which has investigated the use of student assessment data by professional learning teams in schools,
since 2004. The projects examine teachers’ collaborative use of assessment data to inform teaching and investigated the
implications of shifting from a deficit or remedial model of teaching to a developmental approach to improving student
outcomes (Griffin, Murray, Care, Thomas, & Perri, 2010).
Tests from the NAPLAN programme were used to validate the ALP tests (described in more detail below) and also used as
parallel instruments to validate mediation models.

2.2.1. Test validation


The ALP tests were validated by correlating them with the NAPLAN tests. Scores for the NAPLAN tests are reported along a
10-band scale that is uniform across the range of year levels (3–9) and thus provide a way to compare performance between
students and for each student longitudinally. The role of NAPLAN in this study is to provide a parallel measure that can be
used to (1) validate the ALP Num and RC subtests; and (2) validate the results based on the ALP tests, through parallel
analyses using a different measure.
Concurrent validity was assessed for same-year2 administration of the ALP and NAPLAN tests. Predictive validity was
initially assessed by looking at bivariate correlations between 2009 ALP data and 2010–2011 NAPLAN administrations.
Tables 4 and 5 present the combined bivariate and partial correlations between ALP and NAPLAN tests for numeracy and
reading compression respectively. The diagonals (in bold) show the correlation coefficients for same-year administrations
(thus relating to concurrent validity analysis). The ALP 2009 row by the NAPLAN 2011 column shows the correlations
between the ALP 2009 administration and the NAPLAN administrations for the same cohort 2 years thereafter (relating to
predictive validity analysis). It is also possible to interpret the results as showing predictive validity from the NAPLAN
perspective by looking at the coefficients of the 2009 NAPLAN by 2011 ALP administrations, although this is not the main
focus of the validity analyses.
Results show significant correlations for all same-year comparisons. More importantly, all correlation coefficients
(r = .57–.74 and r = .62–.75 for Num and RC tests respectively) are well within a range that is conventionally considered as
strong relationship (>.5 = strong, Cohen, 1988).
First order partial correlations for both were also analysed, controlling for the effects of reasoning ability and thus looking
only at the correlation which remains between ALP and NAPLAN tests after the effect of the control variable is removed.
Results are again all statistically significant, with correlation coefficients (r = .40–.61 and r = .45–.61 for Num and RC tests
respectively) slightly lower than the magnitude for zero order correlations but still showing medium to strong relationship.
This suggests that while reasoning ability explains the bivariate correlation between the two different measures of the

2
i.e., 2009 ALP vs 2009 NAPLAN.
A. Vista / International Journal of Educational Research 62 (2013) 21–35 27

Table 5
Bivariate and partial correlations for ALP and NAPLAN reading comprehension tests.

Control variables ALP test NAPLAN Scaled Score Reading

2009 2010 2011

None (zero order correlations) 2009_1 RC Correlation .620 .512 .639


df 770 333 444
2010_1 RC Correlation .700 .752 .738
df 3569 3977 2668
2011_1 RC Correlation .658 .683 .743
df 2856 5891 6925
2010_1 PS Correlation .656 .662 .630
df 1431 1320 743

2010_1 PS 2009_1 RC Correlation .452 .285 .487


df 352 332 352
2010_1 RC Correlation .452 .544 .539
df 1430 1319 742
2011_1 RC Correlation .464 .501 .606
df 903 903 742

All ps < .01.

constructs of interest (i.e., maths achievement and RC skill), there is still medium to strong correlation between ALP and
NAPLAN measures even if this variable is partialled out and thus common variance can be attributed to something other than
reasoning ability.
The predictive validity of the 2010 ALP Num and RC tests were investigated using regression analysis to determine if 2010
ALP scores predict 2011 NAPLAN scores on corresponding tests. A two-step entry procedure was used with 2010_1 PS
entered first to control for reasoning ability, before entering the predictor of interest (2010_1 Num and 2010_1 RC for 2011
NAPLAN Num and Lit-RC respectively).
The results for the Numeracy tests were significant, with the final model showing that 2010_1 Num scores in ALP predict
2011 NAPLAN num scores, even if reasoning ability (measured by 2010_1 PS) is controlled, DR2 = .14, p < .01. This final model
accounts for a substantial amount of variance in the outcome variable R2 = .66 and results show significant relationship
between the 2010 ALP score and the 2011 NAPLAN score, b = .49, t(652) = 16.18, p < .01. Similar results were obtained for the
RC tests. After controlling for reasoning ability, 2010 ALP scores in RC remains a significant predictor of 2011 NAPLAN RC
scores, DR2 = .17, p < .01, with the final model accounting for more than half of the variance in the NAPLAN RC scores
(R2 = .57). The relationship between the 2010 ALP and 2011 NAPLAN RC scores is even stronger than that found for the Num
scores, b = .58, t(713) = 16.81, p < .01.

2.3. Design

Preliminary analysis showed that there is no evidence that the data structure violates the main assumptions of regression
analysis. Checks on the reliability of the measures used in the analyses in this and subsequent analysis also showed adequate
reliability values. Additional checks on the possibility of ceiling effects in the measure used for the main outcome variable,
maths achievement, provided no evidence that non-trivial ceiling effects exist.
The use of difference or gains scores (Lord, 1956, 1958) have been contentious for decades now, with various camps either
demonstrating its methodological problems (Cronbach & Furby, 1970; Edwards, 1993, 1994) or those defending its use (Rogosa
& Willett, 1983; Tisak & Smith, 1994; Williams, 1996). When groups are not expected to have intrinsic or fundamental
differences, such as comparing student achievement in classes taught by male and female teachers, these comparisons make
sense and results from them are in fact valid. The issues arise when the groups have fundamental differences that directly affect
their initial or starting levels. The reason this becomes a problem is that the function of change between two groups with such
differences may not be uniform. Thus, if group A starts at level A and group B starts at level B, it might not be the case that after a
year, the increase for group A (e.g., A + x) is the same as the increase for group B (e.g., B + x). In other words, xA might not be equal
to xB. In studies that involve time effects, this could yield misleading results. To address these issues, a preliminary analysis was
conducted to determine if the starting levels in Num scores are the same for both language groups.
This preliminary analysis looks at group differences in Num 2010 mean scores across year levels. Language background is
a statistically significant predictor of Num scores, F(1, 5874) = 3.92–6.06, ps = .01–.05, but when the mean differences are
examined in each year level, pairwise comparisons show nonsignificant differences in Num 2010 means between the two
groups in all year levels (Table 6).
This study hypothesised that the path of PS influencing the growth in maths achievement is not a direct effect, but rather
is being mediated by RC skill. This hypothesis implies, therefore, that general cognitive ability has greater significant
influence on the growth of maths performance but that reading skills somehow mediate this influence and this mediating
effect could be moderated by language background. As such, the magnitude of the mediating effect of reading skills could
depend on whether or not the student is a native speaker of the test language.
28 A. Vista / International Journal of Educational Research 62 (2013) 21–35

Table 6
Mean differences in Num 2010 between ESB and NESB students.

Student year in 2010 Mean difference (ESBNESB) SE p*


Pooled Pooled Imputed dataset

1 2 3 4 5

3 .02 .06 .64 .80 .76 .67 .64


4 .09 .06 .10 .11 .12 .10 .09
5 .07 .06 .25 .23 .26 .25 .22
6 .06 .05 .25 .23 .26 .23 .26
7 .06 .07 .28 .59 .26 .42 .35
8 .05 .09 .41 .63 .61 .59 .65
* Multiply imputed datasets were used for this analysis (see Section 5). Multiple imputation analysis does not provide pooled p values.

2.3.1. Testing a moderated mediation model using an SEM approach


Under a structural equation modelling approach, the hypothesis of this study was tested through a moderated mediation
model. The main difference between an SEM approach and the traditional regression approach is the main estimation
methods used by each – maximum likelihood (ML) estimation is used as the default estimation method for SEM while
traditional regression uses ordinary least squares (OLS). One advantage of using SEM to test the moderated mediation model
is therefore to provide comparative values, estimated using a different statistical approach.
Using PS 2010, RC 2010 and gains in maths, a three-variable mediation path model was set up following Baron and Kenny
(1986). This is a simplified model where growth in maths as a variable is just measured as gain scores rather than being
controlled by a covariate. However as a preliminary analysis, this model does allow us to test a moderated mediation hypothesis
in a straightforward manner that we can confirm in a more focussed analysis later on. Language background was then included
as a two-level moderator variable to complete the moderated mediation model (Baron & Kenny, 1986; MacKinnon, Fairchild, &
Fritz, 2007). The analysis was conducted using AMOS 18 (Arbuckle, 2009). The data for this analysis come from only one of the
imputed datasets (dataset 3). AMOS can handle missing data without imputation via a maximum likelihood procedure
(Arbuckle, 2009). However, this analysis includes a computed variable (gains in maths) which is based on two variables (Num
2010 and Num 2011) that are not in the model. Utilising a complete imputed dataset is therefore computationally more efficient
and has the advantage of further streamlining the reporting of results given that this is only one of the preliminary analyses.
Fig. 1 shows the path diagram of the model. RC skill is the mediator variable in this model, with the mediating effect moderated
by language background, which is designated by the paths a1, b1, c1 for ESB and a2, b2, c2 for NESB.

Fig. 1. Path diagram for the moderated mediation model.


A. Vista / International Journal of Educational Research 62 (2013) 21–35 29

PS ability c Growth in maths achievement


(Num 2011 controlling for
Num 2010)

RC skill (hypothesised
mediator)

a b

PS ability c' Growth in maths achievement


(Num 2011 controlling for
Num 2010)

Fig. 2. Mediation model where RC skill mediates the relationship between PS ability and growth in maths achievement.

There are two main focuses of this analysis. First, to test the mediational hypothesis across language background, we
conduct the main statistical test of equality for b1 versus b2. Second, in order to test whether the mediational model itself is
valid regardless of language background, we need to determine whether or not the indirect effect of PS to maths gains is
statistically significant (Fox, 1980). That is, whether the product of the direct effects of PS to RC and RC to maths gains (i.e.,
a*b) is statistically significant. The procedures for testing both hypotheses follow the methodological notes by Denis (2010b)
and Garson (2009).

2.3.2. Testing mediation with regression analysis


The SEM analysis above that fitted a moderated mediational model on the data suggested an analysis approach to
evaluate the role the RC skill plays as a variable in this study. Although the main purpose of the SEM analysis was to test
whether language background is a moderator variable within a simplified model, the results provided some evidence that RC
skill may be partially mediating, rather than moderating, the relationship between PS ability and growth in maths
achievement. There are substantive differences between a mediator and a moderator variable – mediators explain why or
how an independent variable X causes the outcome Y while a moderator variable affects the magnitude and direction of the
relationship between X and Y (Saunders, 1956). There is certainly logic in supposing that RC skill mediates the relationship
between PS ability and maths gains. To view RC skill as a partial mediator means that RC skill is an underlying cause (among
others) for the relationship. Given that the mediational model fits, there is reason to test whether RC skill is mediating the
relationship between PS ability and gains in maths rather than functioning as a primary predictor.
A test of mediation using regression analysis was conducted following the approach proposed by Baron and Kenny
(1986). The steps, as the approach was adapted and implemented in this study, are outlined below:

1. Conduct a regression analysis with PS predicting Num 2011, controlling for Num 2010. This is a test for path c in Fig. 2.
2. Conduct a regression analysis with PS ability predicting the mediator (RC skill), controlling for Num 2010. This is a test for
path a.
3. Conduct a regression analysis with the hypothesised mediator (RC skill) predicting Num 2011, controlling for Num 2010.
This is a test for path b.
4. Conduct a multiple regression analysis with both PS ability and RC skill predicting Num 2011, controlling for Num 2010.
This is a test for path c0 .

In each of these steps, the paths need to be significant in order to proceed to testing the mediation effect. The amount of
mediation is the indirect effect defined in terms of the paths described above as c0 = c  ab (Baron & Kenny, 1986; MacKinnon
et al., 2007). The Sobel test (Sobel, 1982) was used as a statistical test of significance for the indirect effect

ab
z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (6.1)
2 2
ðb sa þ a2 s2b Þ

where a and sa are the non-standardised regression coefficient and standard error of PS, respectively, in step 2; and b and sb
are the non-standardised regression coefficient and standard error of RC, respectively, in step 4 (Baron & Kenny, 1986; Denis,
2010a; Sobel, 1982).
In addition, this regression analysis approach was also validated using NAPLAN data.
30 A. Vista / International Journal of Educational Research 62 (2013) 21–35

Table 7
Critical ratios for the differences among parameters of the moderated mediation model.

Parameters a1 b1 c1 a2 b2 c2

a1 0 37.45 41.77
b1 0
c1 2.24 0
a2 0.10a 30.75 34.03 0 22.23 27.36
b2 24.61 0.75a 2.87 0
c2 30.66 3.21 1.01a 2.48 0
a
CR > j1.96j.

Fig. 3. Model 1 with standardised regression weights by language background.


A. Vista / International Journal of Educational Research 62 (2013) 21–35 31

Table 8
Indirect effect of PS 2010 on maths gains.

Outcome variable Moderator level (group) PS 2010

Standardised indirect effects SEa pa

Gains in maths (2011–2010) ESB – Model 1 .031 .01 .01


NESB – Model 1 .034 .02 .01
a
Based on bootstrap standard errors and 95% bias-corrected confidence level.

Table 9
Pooled parameter estimates for regression models specified in steps 2 and 4 of the mediation test process. ALP data.

Regression model Criterion variable Independent variables B SE

Step 2 RC skill PS ability 0.47 0.02


Num 2010a 0.33 0.02

Step 4 Num 2011 PS ability 0.13 0.04


RC skill 0.15 0.02
Num 2010a 0.32 0.02
a
Covariate.

3. Results

3.1. SEM approach

The first test was done by running the default model to determine the critical ratios (CR) for differences between
parameters of the model. These ratios are two-tailed z-tests between any two parameters of interest, and thus the critical
value for a = .05 is CR = j1.96j. Looking at the CRs for the default model (aCR > j1.96j, Table 7), the parameters for b1 and b2 are
not statistically significant from each other (z = 0.75, p = .45).
Therefore, there is no evidence to reject the hypothesis that path b is equal across the groups. In a more general cross-
group comparison, all of the critical ratios for the paths (a, b and c) are nonsignificant, suggesting that these paths are equal
across both language backgrounds. Based on this result, the test was then extended by constraining the default model such
that all paths are equal across the moderating variable (i.e., a1 = a2, b1 = b2, c1 = c2) and testing if this Model 1 is statistically
different from the default model. Results showed an exact fit for Model 1, x2(3) = 1.04, p = .79. These results show that, even
taking into consideration that an absolute fit index such as the chi-square tends to increase as sample sizes increase and
become more conservative (Bollen & Long, 1993), there is no evidence that the constrained model fits the data differently
from the default model. This implies that there is no evidence of significant differences in the path coefficients, not just for
the mediating path but for all paths in the model, between the two language background groups. Fig. 3 shows the
standardised path coefficients for Model 1 by language background.
To conduct a second test, for evidence of mediation, we implemented a bootstrap procedure to obtain the standard errors
and bias-corrected confidence intervals to test whether the indirect effect of PS ability on maths gains is statistically
significant (Denis, 2010b). AMOS was set up to produce 1000 bootstrap samples and the analysis was run again using the
more constrained Model 1, given it has exact fit. From the results, the standardised indirect effects are presented in Table 8
along with standard errors and statistical significance based on the bootstrap data. There is sufficient evidence to reject the
null hypothesis that RC skill does not mediate the effect of PS ability on maths gains. This supports an interpretation of partial
mediation (not complete because path c is non-zero) (Baron & Kenny, 1986). Further, the results apply equally to both
language backgrounds, implying the mediating effect has equal strength in both groups.

3.2. Regression analysis approach

The model testing path a (step 2) was significant, R2 = .71–.73, F(2, 5883) = 3033.03–3436.56, all ps < .01. For steps 1, 3,
and 4, a regression model3 that includes PS, RC, and Num 2010 as a controlling variable, was found to be significant,
R2 = .42–.47, F(3, 5882) = 3955.73–4654.11, all ps < .01. Table 9 reports the pooled estimates for the regression models
specified in steps 2 and 4. Computing the Sobel test statistic was done using software by Preacher and Leonardelli (2012).
Testing if the indirect path, c0 , is different from zero, the result is significant, z = 6.96, SE = .01, p < .01. This confirms the
results from the SEM approach and supports the interpretation that RC skill is a partial mediator of the relationship
between PS ability and growth in maths achievement. The mediating effect is only partial since PS ability remains a
significant predictor of maths growth even when RC skill was added to the regression model (Baron & Kenny, 1986; Denis,
2010a; MacKinnon et al., 2007).

3
This model is similar to the second model in Fig. 2 but without path a.
32 A. Vista / International Journal of Educational Research 62 (2013) 21–35

Table 10
Pooled parameter estimates for regression models specified in steps 2 and 4 of the mediation test process. NAPLAN data.

Regression model Criterion variable Independent variables B SE

Step 2 RC skill PS ability 22.15 2.99


Num 2010a 0.55 0.03

Step 4 Num 2011 PS ability 35.77 11.23


RC skill 0.15 0.05
Num 2010a 0.11 0.05
a
Covariate.

3.2.1. Validation of the partial mediation model for RC skill


As a parallel analysis, it also of interest to validate whether RC skill remains a partial mediator when independent
alternative measures are used. The same test of mediation described above was conducted, but replacing ALP measures by
NAPLAN measures, except for PS scores. Repeating the analysis procedures used for the ALP data, Table 10 reports the pooled
estimates for the regression models specified in steps 2 and 4. Testing if the indirect path, c0 , is different from zero, the result
is significant, z = 2.81, SE = 1.20, p < .01. This suggests that the partial mediation model is tenable for both ALP and NAPLAN
data and validates the interpretation that RC skill is a partial mediator of the relationship between PS ability and growth in
maths achievement.

4. Discussion

A moderated mediation model fitted using the SEM approach showed no evidence of moderation by language
background, implying that language background does not have an effect on how RC skill mediates the relationship between
PS ability and growth in maths achievement.
Partial mediation is confirmed in more focussed tests of mediation using regression analysis. Analysis using both ALP and
NAPLAN data provides corroborating evidence that RC skill may be partially mediating the relationship between PS ability
and growth in maths, suggesting that RC skill mediates the path of PS ability predicting maths gains.
The statistical power due to two large datasets that are based on independent measures strengthen the findings that RC
skill partially mediates the influence of PS ability on maths achievement growth. This partial mediation is not strong, only
.031 or about a fifth of the direct effect of PS ability on maths growth (see Fig. 3), but it shows that the effect is attenuating.
That is, the indirect effect is decreasing as RC skill increases. Interestingly, the direct effect of PS skill on maths growth is also
negative, implying that those with higher PS ability have lower rates of growth in maths achievement. This does not mean
that those with higher PS ability have lower levels of maths achievement. In fact, PS ability is a positive predictor of Num
2011 even when Num 2010 is controlled (see Tables 9 and 10). These attenuating direct and indirect effects suggest that the
score difference in maths between 2010 and 2011 appears to slow down as years increase. The weakening of growth over
time cannot be attributed to measurement error, as an anomaly specific to the instruments used, because both ALP and
NAPLAN data show the same narrowing of score differences over time.
This weakening of growth that can be observed in both ALP and NAPLAN data may be explained by a time-related factor.
Age-related factors could play an important role here, such as biological maturation effects that function across the entire
lifespan (McGrew & Hessler, 1995). Changes in academic instruction, school-related developmental effects and
environmental effects may also exert varying degrees of influence on the rate of growth in maths, and can be reflected
as a function of grade level (Floyd, Evans, & McGrew, 2003; Mullis et al., 2001). Time can further have an indirect effect on
maths achievement among immigrant students as a factor of acculturation in the classroom (Betts, Bolt, Decker, Muyskens, &
Marston, 2009).
The factors behind this effect of weakening growth over time can be numerous and complex. Discussing these factors in
detail is beyond the scope of this study. Nevertheless, it is important to establish that the path loading of PS ability on growth
in maths is negative and this is further attenuated by an indirect effect of RC skill. This means that while higher PS ability
results in slower growth in maths achievement, the reduction in growth is weakened proportionally by RC skill.
In addition, it is also an important finding that these mechanisms hold uniformly regardless of language background, and
that with no evidence of group differences with respect to the partial mediation model, the results have better
generalisability to the student population. Because most, if not all, government schools classify language background
dichotomously, by extension it can also be implied that the findings are generalisable to all students that share the same
characteristics as the study participants. More definite conclusions may be premature, but the results show that within
limitations, language background is not a significant factor on learning outcomes in maths. This finding in turn implies that
whatever intervention is implemented, it should affect students equally regardless of language background. This has
important consequences for future system-wide implementation of large scale interventions on Australia’s linguistically
diverse classrooms.
In a more general framework, and extending the implications beyond Australian classrooms, the results suggest that RC
skill on the language that was used to measure maths achievement partially mediates the relationship between PS ability
and growth in maths. In countries where the test-language for maths achievement may not be the native language of
A. Vista / International Journal of Educational Research 62 (2013) 21–35 33

significant groups within the student population, the mechanism of the mediating effect may be similar (although the
magnitude may differ). The lack of evidence that a broad demographic variable of language background moderates the
mediation implies that the mediating effect of the test-language RC skill is more individual-specific rather than a majority-
minority language background effect. It would be of future interest if this study is replicated in other countries, especially in
the Asia-Pacific region.

5. Limitations of the study

The first main limitation of this study is the scope of its sample. This study only looked at students from government
schools in Victoria. Although this is a large sample that approaches the target population within the respective
administrative regions in the state, the exclusion of Catholic and independent schools limits the generalisability of the
findings. Another limitation related to scope is the range of academic year levels included in the study. No preschool or senior
high school students were included in the sample. The developmental differences within these two ends of the age-range
could be significant and are not captured in this study.
Related to this limitation on scope is how language background is categorised in the data. Defining language background
as a dichotomous variable and binning cases into either an English-speaking or non-English speaking background limits the
information that can be inferred from this grouping. More importantly, by ignoring the considerable differences within the
broad definition of non-English speaking group, we make an assumption that all NESB students learn English at a uniform
rate or that the type of native language relative to the target language does not moderate the growth in reading outcomes for
NESB students (as implied by the findings of Betts et al., 2009). This assumption, however, is not universally supported. Some
studies show that similarity of native language to the target language does affect the rate of growth in language learning,
especially in reading (e.g., Ellis & Beaton, 1993; Wang, Park, & Lee, 2006).
Future studies should attempt a more comprehensive differentiation within the NESB (or LBOTE) category. This can be
done by researchers using smaller datasets, where actual verification of language background based on country of origin may
be practical. For large-scale studies that rely on data collected by schools or districts, the recommendation is for school
administrators and policy makers to standardise the collection of demographic data and include a more comprehensive
grouping of language background.
The main methodological limitations of this study are missing data and unequal sample size. Each of these issues is briefly
discussed below.

5.1. Unequal sample sizes for moderator subgroups and effect on power

As with social science research that is based on population subgroups (e.g., ethnicity, SES, language background), it is not
uncommon to find these groups to be unequal in size, both in their natural proportion and in sampling. The statistical effect
of unequal subgroup sample sizes is that the effective total sample size is smaller, and thereby the statistical power is
comparatively lower, than in balanced grouping situations. Hsu (1993) defined the effective total sample size as equal to the
reciprocal of the average reciprocals of subgroup sample sizes, 2/(1/n1 + 1/n2), where n1 and n2 are the subgroup sample sizes.
This reduction of effective total sample size is compounded by the attenuation of effect sizes in moderated multiple
regression (MMR) analysis (Evans, 1985), the practical effect of which is the reduced statistical power of any given sample
size. This study however is not substantially affected by the consequence of lowered effective total sample size given the
large subgroup sample sizes. With ESB = 4364 and NESB = 1522, the effective total sample size is equal to 2256 – lower than
the actual total sample size but still adequately large enough.
MMRPWR (Aguinis & Pierce, 1998) was used to compute for empirical-based statistical power to detect an effect size as
small as the median value (f2 = .002) in the MMR literature (Aguinis, Beaty, Boik, & Pierce, 2005). The software can estimate
statistical power for MMR analysis with dichotomous moderator variables but only with one predictor at a time (see Aguinis
& Pierce, 1998 for a more thorough description of the software). Two power analyses were conducted for PS and RC scores as
predictors with language background as the moderator variable. Results showed more than adequate power, (1  b) = .97,
CI95% = .89–.99 for PS score as predictor, and (1  b) = .92, CI95% = .80–.99 for RC score as predictor.

5.2. Missing data issues

The second main methodological issue reflects the challenges inherent in longitudinal studies, which is the management
of missing data. Non-simultaneous administration of all instruments used in the study, combined with independently
administered tests such as the NAPLAN, resulted in scale-level missing data. This is due to the increased probability that a
student may miss one or more tests in one or more time points as the numbers of students, tests, and timepoints increase.
Also due to the large number of participating schools, most of which have independent time tables and testing schedule, the
effect of student absences during test administrations is magnified. Fortunately, the scale-level type and MAR mechanism of
missingness still provide statistical information that can be used to recover some parameters of the missing data (Newman,
2003; Schafer & Graham, 2002; Schafer & Olsen, 1998).
Multiple imputation (Rubin, 1976, 1987) was the main method for dealing with missing data for the moderated
regression analysis portion of this study. All variables with missing data across the three administrations were processed
34 A. Vista / International Journal of Educational Research 62 (2013) 21–35

through an implementation of multiple imputation using SPSS (2008). The details of multiple imputation are discussed more
comprehensively elsewhere (see Allison, 2000; Zhang, 2003) but it is implemented in SPSS through an iterative Monte Carlo
Markov Chain (MCMC) method using a univariate model (SPSS, 2008). For the SEM approach, full information maximum
likelihood (FIML) (Anderson, 1957), which is analogous to multiple imputation under a maximum likelihood framework, was
used to deal with missing data.
The missing data analysis techniques used in this study somewhat mitigated the detrimental effects of missing data and
avoided the unnecessary loss of cases from ad hoc techniques like case-wise deletions. This study therefore recommends that
future researchers facing the same challenges in studies that are prone to attrition should consider the use of more
sophisticated missing data techniques such as multiple imputation and FIML over case-wise deletions (provided of course
that the characteristics of missingness allow the use of such techniques). Nevertheless, the overall effect of missing data is
still a limitation when compared with a situation with complete sample. A perfectly complete sample may not be attainable
in practice but this limitation does highlight the need to strive to reduce missing data as much as possible.

Acknowledgments

The data for this study were collected as part of a broader research project called the Assessment and Learning
Partnerships (ALP), which is currently being conducted by the Assessment Research Centre at the University of Melbourne.
The collaboration for the data collection process and the development of the research instruments included the
contributions of M. Pavlovic, N. Awwal, and P. Robertson as the research team under the supervision of Prof. Patrick Griffin
and Assoc. Prof. Esther Care as the principal investigators.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/
j.ijer.2013.06.009.

References

Aguinis, H., & Pierce, C. A. (1998). Statistical power computations for detecting dichotomous moderator variables with moderated multiple regression. Educational
and Psychological Measurement, 58, 668–676.
Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect size and power in assessing moderating effects of categorical variables using multiple regression: A
30-year review. Journal of Applied Psychology, 90, 94–107.
Alderman, D. L. (1981). Language proficiency as a moderator variable in testing academic aptitude TOEFL Research Beports. Princeton, NJ: Educational Testing Service.
Allison, P. D. (2000). Multiple imputation for missing data: A cautionary tale. Sociological Methods & Research, 28(3), 301.
Anderson, T. W. (1957). Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of the American
Statistical Association, 52(278), 200–203.
Arbuckle, J. (2009). Amos (Version 18.0). Crawfordville, FL: Amos Development Corporation Retrieved from www.amosdevelopment.com.
Assessment Research Centre. (2009a). Creative Problem Solving Retrieved from http://www.edfac.unimelb.edu.au/arc/projects/measurement/cps.html.
Assessment Research Centre. (2009b). General Ability, English and Mathematics (GEM) tests Retrieved from http://www.edfac.unimelb.edu.au/arc/projects/
measurement/gem.html.
Australian Bureau of Statistics. (2006). A Picture of the Nation: The Statistician’s Report on the 2006 Census. (2070.0) Retrieved from http://www.abs.gov.au/ausstats/
subscriber.nsf/LookupAttach/2070.0Publication29.01.091/$File/20700_A_Picture_of_the_Nation.pdf.
Australian Council for Educational Research. (2000). The measurement of language background, culture and ethnicity for the reporting of nationally comparable
outcomes of schooling. Melbourne: National Education Performance Monitoring Taskforce.
Australian Curriculum, Assessment and Reporting Authority. (2011). National Assessment Program – Literacy and Numeracy Retrieved from http://www.nap.e-
du.au/NAPLAN/index.html.
Australian Curriculum, Assessment and Reporting Authority. (2012). General capabilities in the Australian Curriculum Retrieved from http://www.australian-
curriculum.edu.au/GeneralCapabilities/Overview/General-capabilities-in-the-Australian-Curriculum.
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic and statistical
considerations. Journal of Personality and Social Psychology, 51, 1173–1182.
Betts, J., Bolt, S., Decker, D., Muyskens, P., & Marston, D. (2009). Examining the role of time and language type in reading development for English Language
Learners. Journal of School Psychology, 47(3), 143–166 http://dx.doi.org/10.1016/j.jsp.2008.12.002
Bollen, K. A., & Long, J. S. (Eds.), Testing structural equation models (Vol. 154). Thousand Oaks, CA: Sage Publications, Inc.
Carstairs, J. R., Myors, B., Shores, E. A., & Fogarty, G. (2006). Influence of language background on tests of cognitive abilities: Australian data. Australian Psychologist,
41(1), 48–54.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). San Diego, CA: Academic Press.
Cronbach, L. J., & Furby, L. (1970). How we should measure ‘‘change’’: Or should we? Psychological Bulletin, 74(1), 68–80.
Denis, D. J. (2010a). How to test for mediation & Sobel test. Data & Decision Lab, Department of Psychology, University of Montana Retrieved from http://
psychweb.psy.umt.edu/denis/datadecision/multigroup/amos_group.html.
Denis, D. J. (2010b). Multi-group analysis in AMOS. Data & Decision Lab, Department of Psychology, University of Montana Retrieved from http://psychweb.p-
sy.umt.edu/denis/datadecision/multigroup/amos_group.html.
Department of Immigration and Citizenship. (2008). The People of Victoria: Statistics from the 2006 Census: Commonwealth of Australia.
Edwards, J. R. (1993). On the use of polynomial regression equations as an alternative to difference scores in organizational research.(includes appendix). Academy
of Management Journal, 36, 1577, (Special Research Forum: Methodological Issues in Management Research).
Edwards, J. R. (1994). Regression analysis as an alternative to difference scores. Journal of Management, 20(3), 683, (Article).
Ellis, N. C., & Beaton, A. (1993). Psycholinguistic determinants of foreign language vocabulary learning. Language Learning, 43, 559–617.
Evans, M. G. (1985). A Monte Carlo study of the effects of correlated method variance in moderated multiple regression analysis. Organizational Behavior & Human
Decision Processes, 36, 305.
Floyd, R. G., Evans, J. J., & McGrew, K. S. (2003). Relations between measures of Cattell-Horn-Carroll (CHC) cognitive abilities and mathematics achievement across
the school-age years. Psychology in the Schools, 40(2), 155.
A. Vista / International Journal of Educational Research 62 (2013) 21–35 35

Fox, J. (1980). Effects analysis in structural equation models. Sociological Methods and Research, 9, 3–28.
Fuchs, L. S., Compton, D. L., Fuchs, D., Paulsen, K., Bryant, J. D., & Hamlett, C. L. (2005). The prevention, identification, and cognitive determinants of math difficulty.
Journal of Educational Psychology, 97, 493–513.
Fuchs, L. S., Fuchs, D., Compton, D. L., Powell, S. R., Seethaler, P. M., Capizzi, A. M., et al. (2006). The cognitive correlates of third-grade skill in arithmetic,
algorithmic computation, and arithmetic word problems. Journal of Educational Psychology, 98(1), 29–43.
Garson, G. D. (2009). Structural equation modeling. Statnotes: Topics in Multivariate Analysis Retrieved from http://faculty.chass.ncsu.edu/garson/PA765/
structur.htm.
Gray, J. R., Chabris, C. F., & Braver, T. S. (2003). Neural mechanisms of general fluid intelligence. Nature Neuroscience, 6(3), 316.
Griffin, P., Murray, L., Care, E., Thomas, A., & Perri, P. (2010). Developmental assessment: Lifting literacy through professional learning teams. Assessment in
Education: Principles, Policy & Practice, 17(4), 383–397 http://dx.doi.org/10.1080/0969594x.2010.516628
Hart, S. A., Petrill, S. A., Thompson, L. A., & Plomin, R. (2009). The ABCs of math: A genetic analysis of mathematics and its links with reading ability and general
cognitive ability. Journal of Educational Psychology, 101(2), 388–402.
Hauger, J. B., & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second
language. International Journal of Testing, 8, 237–250.
Hemmings, B., Grootenboer, P., & Kay, R. (2011). Predicting mathematics achievement: The influence of prior achievement and attitudes. International Journal of
Science and Mathematics Education, 9(3), 691–705 http://dx.doi.org/10.1007/s10763-010-9224-5
Hsu, L. M. (1993). Using Cohen’s tables to determine power attainable in two-sample tests when one sample is limited in size. Journal of Applied Psychology, 78,
303–305.
Kelderman, H. (1988). Common item equating using the loglinear Rasch model. Journal of Educational Statistics, 13(4), 319–336.
König, C. J., Bühner, M., & Mürling, G. (2005). Working memory, fluid intelligence, and attention are predictors of multitasking performance, but polychronicity
and extraversion are not. Human Performance, 18(3), 243–266.
Kyttälä, M., & Lehto, J. E. (2008). Some factors underlying mathematical performance: The role of visuospatial working memory and non-verbal intelligence.
European Journal of Psychology of Education, 23(1), 77–94.
Lord, F. M. (1956). The measurement of growth. Educational and Psychological Measurement, 16(4), 421–437.
Lord, F. M. (1958). Further problems in the measurement of growth. Educational and Psychological Measurement, 18(3), 437–451.
MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation analysis. Annual Review of Psychology, 593–614.
McGrew, K. S., & Hessler, G. L. (1995). The relationship between the WJR Gf-Gc cognitive clusters and mathematics achievement across the life-span. Journal of
Psychoeducational Assessment, 13(1), 21–38.
Ministerial Council for Education, Early Childhood Development and Youth Affairs. (1997). National Report on Schooling in Australia 1997. Melbourne: Curriculum
Corporation.
Ministerial Council for Education, Early Childhood Development and Youth Affairs. (2009). National Assessment Program – Literacy and Numeracy Retrieved from
http://www.naplan.edu.au/about/national_assessment_program-literacy_and_numeracy.html.
Morsy, L., Kieffer, M., & Snow, C. (2010). Measure for measure: A critical consumers’ guide to reading comprehension assessments for adolescents. New York: Carnegie
Corporation of New York.
Mullis, I. V. S., Martin, M. O., Gonzalez, E. J., O’Connor, K. M., Chrostowski, S. J., Gregory, K. D., et al. (2001). Mathematics benchmarking report: The Third International
Math and Science Study—Eighth Grade. Boston, MA: Boston College International Study Center.
Newman, D. A. (2003). Longitudinal modeling with randomly and systematically missing data: A simulation of ad hoc, maximum likelihood, and multiple
imputation techniques. Organizational Research Methods, 6(3), 328–362.
Preacher, K. J., & Leonardelli, G. J. (2012). Calculation for the Sobel test: An interactive calculation tool for mediation tests Retrieved from http://quantpsy.org/sobel/
sobel.htm.
Primi, R., Ferrao, M. E., & Almeida, L. S. (2010). Fluid intelligence as a predictor of learning: A longitudinal multilevel approach applied to math. Learning &
Individual Differences, 20(5), 446–451.
Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20(4),
335–343.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Saunders, D. R. (1956). Moderator variables in prediction. Educational and Psychological Measurement, 16(2), 209–222 http://dx.doi.org/10.1177/
001316445601600205
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research,
33(4), 545–571.
Seethaler, P. M., & Fuchs, L. S. (2006). The cognitive correlates of computational estimation skill among third-grade students. Learning Disabilities Research &
Practice, 21(4), 233–243.
Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equations models. In S. Leinhart (Ed.), Sociological methodology (pp. 290–312).
Washington, DC: American Sociological Association.
Solano-Flores, G., & Trumbull, E. (2003). Examining language in context: The need for new research and practice paradigms in the testing of English-language
learners. Educational Researcher, 32(2), 3–13.
SPSS. (2008). SPSS Missing ValuesTM 17.0. Chicago: ILSPSS Inc.
Tate, W. (1997). Race-ethnicity, SES, gender, and language proficiency trends in mathematics achievement: An update. Journal for Research in Mathematics
Education, 28(6), 652–679.
Tisak, J., & Smith, C. (1994). Defending and extending difference scores methods. Journal of Management, 20(3), 675–682.
Vilenius-Tuohimaa, P. M., Aunola, K., & Nurmi, J.-E. (2008). The association between mathematical word problems and reading comprehension. Educational
Psychology: An International Journal of Experimental Educational Psychology, 28(4), 409–426.
Vista, A. (2012). The role of problem solving ability and reading comprehension skill in predicting growth trajectories of mathematics achievement between ESB and NESB
students. Melbourne: University of Melbourne (PhD Dissertation).
Vista, A., & Grantham, T. (2010). Effects of parental education level on fluid intelligence of Philippine public school students. Journal of Psychoeducational
Assessment, 28(3), 236–248.
Wang, M., Park, Y., & Lee, K. R. (2006). Korean-English biliteracy acquisition: Cross language and orthography transfer. Journal of Educational Psychology, 98,
148–158.
Williams, R. H. (1996). Are simple gain scores obsolete? Applied Psychological Measurement, 20(1), 59–69.
Zhang, P. (2003). Multiple imputation: Theory and method. International Statistical Review, 71(3), 581–592.

You might also like