Professional Documents
Culture Documents
Abstract
This paper analyses and presents the relationship between the Massey University ESOL
Placement Test (MUPT) used at the centre for Professional and Continuing Education
(PaCE), Massey University, and the International English Language Testing System
(IELTS) bands awarded the same students. It is a continuation of the investigation and
development of the PaCE ESOL Placement Test. The scores between the five sections of the
MUPT – reading, writing, speaking, listening and the C-Test – have been statistically
correlated against the IELTS skill test bands. The results point to construct and criterion/
concurrent validity of the PaCE Placement Test in evaluating the overall proficiency level
of applicants for streaming in the ESOL Programme and the real possibility of developing
placement tests for specific programmes, which reflect the international norms of
Introduction
Context
The ESOL Programme at Massey University is one of five divisions within PaCE. The
ESOL programme/division has four levels of language instruction, which are sometimes
streamed into higher and lower cohorts within the level, based on a student’s Placement Test
(MUPT) score/profile. This placement test was developed and piloted to confirm both
reliability, and construct validity, and provides a profile of English language ability in the
four skills and general proficiency. These results have been reported elsewhere (Hiser, 2010,
2005, 2002). Placement may also be influenced by officially reported IELTS scores,
Class size for each level varies from six to fifteen students including both genders, a variety
educational achievements. Each cohort of students has a mixture of goals, needs, strengths
and weaknesses. They may also include a mixture of motivation levels, both intrinsic and
extrinsic.
The variety of student goals includes improving general English for residency in New
Weeks Four and Eight. Additionally, there is usually a weekly on-going assessment attached
Purpose of Study
Most of the students, whether continuing studies in English or not, are concerned with
IELTS evaluation, it being the high-stakes test of English language proficiency in the
particularly those in the Commonwealth. This paper explores the possibility of establishing a
IELTS. Such an instrument would provide a generally predictive evaluation for IELTS
achievement when students have not yet taken the test. The research proposed would
determine if there is a range of scores in parallel English skills components of each test or
overall scores that can be used in general evaluation of students’ language preparedness for
tertiary study. A serious statistical study would also further validate the C-Test section of the
PaCE Placement Test in reporting general proficiency and the skills profile for entrants into
Literature Review
The results of the search for work similar to this project was surprisingly limited in the
literature, also noted in Wall, Clapham and Alderson (1994). This lack of informed studies
continues today. Their study was weakened by the lack of language proficiency data from a
specific, valid, or relevant other tests to which they could compare all students.
Relatively few students presented test evidence for their language proficiency in
advance of their arrival at Lancaster. What test data were available related to a
(references, students’ own assertions) were both too varied, and of too uncertain
This study is better able to provide this information for the sample as there were official
IELTS report documents for each student – that being the basis for major variables used in
The C-Test
The cloze procedure was developed as a measure of readability with native speakers (Taylor,
1953). It later was adapted for use as a measure of ESL/EFL proficiency (Darnell, 1970;
Brown, 1983, 1988, 1993; Irvine, Atai, & Oller, 1974). In light of criticism it was modified
by Klein-Braley and Raatz (1984). Their new form became known as a C-Test. It involves
deleting half of each alternating word in a passage, which is offered for test takers to restore
and thereby demonstrate their degree of language proficiency. The remaining letters after
cutting the last half of the words are allowed on the test as hints or clues to the full word
required in completing the passage. The C-test procedure was developed to answer the
psychometric problems of cloze testing, and has been purported to be an empirically and
theoretically valid measure of language proficiency (Raatz & Klein-Braley, 1984; Klein-
Braley, 1985; Klein-Braley & Raatz, 1984, 1985; Raatz, 1985). A major factor in the
following study, the C-Tests given here confirm their history of reliability and validity
adding to the overall discussion of the relationship between the local placement test and the
IELTS.
Methodology
Approach
Since a statistical approach was decided upon, the following sets of variables and data were
IELTS Scores
possible scores
for skills.
These scores show a range of categories in evaluation of 80 for the MUPT (four skills
categories of 20 possible scores), and 72 for the IELTS (four skills categories plus half band
evaluations totalling 18 possible scores) (Cambridge ESOL, 2010). This indicates a slightly
finer skills’ evaluation on the MUPT. The C-Test is assumed a general English language
proficiency test in its own right and is not included in the comparison of skills. It will be
shown, as expected, to have a relationship with overall proficiency scores on the two
instruments. In addition to individual skills tests, the Overall Band Score (OBS) is
considered the general proficiency score for the IELTS. The issue of general proficiency
versus individual skill scores is obviously a different, but related question, assuming
acceptance of the idea that individual skills make up, or are components of, general
proficiency.
The Sample
The sample chosen for the study included (N=) 45 (+3) students at PaCE. The sample size
was limited by the number of students at PaCE for which both the IELTS’ scores and MUPT
scores were available. The Placement Test for three students with IELTS scores were unable
to be located. The ethnic groups among the sample were Arabic, Chinese, and Japanese. The
IELTS scores are confidential and this sample is also limited by use of only scores that were
voluntarily offered, not solicited, and verbal permission given for use in the study. The
majority were young adults working toward undergraduate entry to Massey University by
IELTS requirements – 6.0 overall band score. There was a small percentage working toward
IELTS 6.5 and higher, for postgraduate work. The gender breakdown was not considered an
important aspect of the study, but overall it reflected the gender represented at PaCE, which
is approximately one to one. The analysis of the variables for comparison employed
Pearson’s two-tailed correlations, a more conservative calculation for interval variables than
Results
Variable Descriptives
The two main variables to be considered were the overall scores for each test – the I-total
and the P-total. The IELTS’ band scores as shown in Figure 1clearly indicate the level of the
students in the programme and the goals of the vast majority – the mean band score was 4.8
working to achieve 6.0—entry into the university at the undergraduate level. The few higher
level students in the study obviously (the I-total and the P-total) appear at the top end of the
The Placement Test (MUPT) score totals (see Figure 2) also represent the range of entry
scores for the programme, the mean score being approximately 40 out of 100 for this
sample. The categories between 50 and 70 in the histogram again probably represent the few
higher level students (4) in the programme who are looking for entry into postgraduate work
with a requirement of 6.5 or better on IELTS. A midpoint score, or mean of 50, might have
been expected had students achieving that level not gone directly into undergraduate
university studies and programmes. The mode of 50~55 probably shows the cohort in
general is made up of students just below university entry (IELTS 6.0) who are making a
push to reach that level. Note on Figure 1 there appears only one student already scoring 6.0
Mean = 4.8
Stdev. = 0.67
N = 48
Mean = 40.4
Stdev = 12.4
N = 45
A new variable for the MUPT scores was created, MUPT.80, which excludes C-Test scores
and represents the total placement test score of the four skills which more clearly parallels
the structure of the IELTS OBS. Hereafter the C-Test score will be considered parallel to the
overall proficiency test scores, and an investigation of the new variable’s performance in
The MUPT scores are a total of five components of twenty points each, four of which
parallel the IELTS skills scores, but it also includes a fifth section measuring general
proficiency by use of a C-Test. Including the C-Test, total possible score is 100 points.
These five 0~20 scores allow the creation of a profile for each student showing strong and
A histogram of the C-Test scores since they also measure general proficiency (Figure 3)
provides a second comparison to the chart of IELTS OBS used here in Figure 1. This chart
indicates the central tendency of the sample to be slightly lower (the mean is 7.33 out of a
possible 20) than the range of possible scores suggest would be appropriate – a midpoint
score of 10.0. This shows a direct reflection of both the MUPT total scores and the IELTS
OBS for the sample. The standard deviation for the C-Test is 3.2 for a sample of 48.
slightly below an expected score of 40, (N=48). The parallel distribution of the four
Results
Analyses
Initially to look at the relationship between the MUPT and IELTS scores, a pictorial cross
tabulation (Figure 5) was developed between the two total scores (IELTS OBS and the
MUPT total). There appears to be good clustering of scores for both test results between
IELTS Bands 4.5/5.0 and the MUPT scores 20 through 50 as illustrated below. Additionally
for the reasons stated above concerning general proficiency scores, scatterplots of the C-test
scores on the MUPT and the IELTS OBS were also created (see Figures 6 and 7).
actual correlation value upon which the IELTS’ scatterplot is developed was a highly
significant value of 0.465 for a sample of 45 (see Figure 6). The correlation value (ρ) for the
MUPT C-Test and MUPT.80 is 0.597, also with a highly significant value for a sample of
45. The relationship between the C-test and total score for the MUPT (.768**) is not
To look more closely at the internal component skill scores within each test, two sets of
Pearson’s correlations were calculated, one for each examination. These calculations again
point to construct validity for each instrument, since the internal correlations between the
components of each test show good or strong relationships all with highly significant (P ≤
0.01) results. See the first set of correlations for the IELTS skills’ tests in Figure 8. Figure 9
The relationships between each IELTS skill and the OBS are all moderately strong with a
range between 0.691 and 0.875. This would indicate that collectively for this sample,
reading skill was contributing the least to the OBS sample total, while listening skill was
contributing the most (see the last row of Figure 8). This is also a general indication that
overall, reading skill is the poorest contributor of the four skills measured by the test for
OBS
This table of correlations, Figure 8, also reflects probable construct validity of the IELTS.
Looking at the correlations for the internal components – again all highly significant – there
is a range of values between 0.361 and 0.640. These are weak to moderate relationships with
the poorest value between reading skill and speaking ability. The strongest relationship
seems to be between writing and listening skills. Neither of these relationships correspond
MUPT
Figure 9, showing correlations for the MUPT components, is parallel to the IELTS table in
Figure 8, but possibly indicates stronger construct validity as the range of values for the
correlations between skills and the total score is higher (0.688 to 0.833) (see the first row of
Figure 9). This may be because the MUPT, as mentioned above, has a greater number of
scoring categories for students (80) and is therefore a somewhat more fine-grained measure
of proficiency and skill than the IELTS, which has fewer scores or categories (72) with
which to designate a student’s ability. The range of correlations for the internal components/
skills for MUPT is also slightly larger with only the lowest value between writing and
listening (0.357) being less than the lowest value between speaking and reading for the
IELTS (0.361). Internal bi-variate correlations for the MUPT range from 0.357 to 0.680
between reading and writing skills. These scores are more intuitively correct than the values
for IELTS. Reading and writing skills would seem to be more naturally related than writing
and listening scores on the IELTS, the logical match between skills being R/W and L/S
(Hinkel, 2006).
A set of correlations between the general proficiency scores or total skills’ scores indicates
probable criterion validity among tests which are directed independently at evaluation of
general proficiency. The relationship between the MUPT (p.total) and the MUPT.80 score is
naturally higher than the others (0.768, calculated but not shown) as the C-Test values were
included within the variable p.total. The IELTS’ OBS in comparison with the other three
scores are of main concern and provide low to moderate coefficients of 0.375 for the C-test,
0.473 for the MUPT.80, and 0.482 for the p.total, all scores being highly significant two-
MUPT
IELTS
C.TEST MUPT.80 p.total
i.total Pearson Correlation .375** .473** .482**
Sig. (2-tailed) .009 .001 .001
(OBS) N 48 48 48
Finally, detailed analyses of the correlations between the individual skill scores across tests
were calculated. Listening skills for the two instruments compare favourably – highly
MUPT IELTS
Listening
listening listening
p.listen Pearson Correlation 1 .506**
Sig. (2-tailed) .001
N 45 42
i.listen Pearson Correlation .506** 1
Sig. (2-tailed) .001
N 42 48
The speaking tasks on the two tests show a weak relationship (0.325), which is still
Writing tasks for the two examinations are again moderate in strength and remain highly
The final skill comparison between reading ability, as evaluated on the two instruments,
coefficient shown between the two reading tests and an amazingly low significance—974
MUPT IELTS
Speaking
Speaking Speaking
p.speak Pearson Correlation 1 .325*
Sig. (2-tailed) .036
N 45 42
i.speak Pearson Correlation .325* 1
Sig. (2-tailed) .036
N 42 48
Figure 13: Correlations: Writing Task Scores
MUPT IELTS
Writing
writing writing
p.write Pearson Correlation 1 .402**
Sig. (2-tailed) .008
N 45 42
i.write Pearson Correlation .402** 1
Sig. (2-tailed) .008
N 42 48
MUPT IELTS
Reading
reading Reading
p.read Pearson Correlation 1 -.005
Sig. (2-tailed) .974
N 45 42
i.read Pearson Correlation -.005 1
Sig. (2-tailed) .974
N 42 48
This surprising result implies that the reading tasks are evaluating different skills and may
be due to either the complex cognitive nature of reading, or the type of tasks used on each
test instrument to evaluate reading skills. IELTS task types are selected from a list of
possibilities that includes: multiple choice, identifying information from the text
from the text to distractors of various sorts, matching headings, i.e. grouping or classifying
details with specific categories, matching features/statements with a list of options, matching
sentence endings, sentence completion with short phrases, summary of text, completion of
an outline of the text, flow chart completion, or finally, diagram label completion (IELTS
Academic Test, 2013) Cambridge ESOL, 2010). This almost overwhelming list of skills
involved in reading, only a small selection of which may be on any particular IELTS test,
can hardly be expected to compare to the simple choices of the MUPT to evaluate
comprehension with multiple choice distractors, matching information with multiple choice
distractors, completing the text itself with possible phrases, and completing sentences with
short phrases created by rewording the text. These different approaches to evaluating
reading obviously are measuring sub-skills of a different nature, and hence the disparate
This lack of correspondence on one component of the tests also reduces the overall criterion
acceptable and useful, but obviously, if this component were modified, the two tests would
demonstrate even greater concurrent validity, as every other aspect of the tests correlates
Conclusion
A number of observations and interpretations can be offered from the data presentation or
The C-test being used within the PaCE ESOL Placement Test (MUPT), 1) again
IELTS examination.
All skills sections and total scores on both exams show significant relationships with
each other, except the reading components which do contribute to internal cohesion
independently on each individual test, but just not between each other.
The internal construct validity of both the MUPT and the IELTS tests appears
satisfactory with the stronger cohesion of the components produced by the MUPT.
Among the internal relationships of the two tests, the reading skill evaluation on the
IELTS is the weakest among the four skills, yet the reading skill evaluation on the
Internal relationships with total scores (construct validity) are stronger on the MUTP
for all skills components except the listening sections, showing the MUPT to be
The MUPT seems better at profiling the student skills as it has a slightly finer-
grained discrimination within both component scores and overall scores, therefore,
The MUPT, particularly the C-Test component, is more than adequate as a general
specific coefficient for this prediction but is beyond the scope of this paper.
Overall, the two examinations appear to evaluate proficiency accurately in a robust manner.
The only reservation would be with regard to reading, which while it did not perform well in
relation to the IELTS reading component, does perform well within the set of MUPT
variables. Profiles provided by both tests are informative in relationship to the overall
scores, but seem slightly more accurate on the MUPT than the IELTS. Most institutions
could benefit from developing their own placement test, particularly if the students are
concerned about IELTS results rather than just learning English. It would provide not only a
Brown, J. D. (1983). A closer look at cloze: Validity and reliability. In Oller, J. W. (Ed.)
Brown, J. D. (1988). Tailored cloze: Improved with classical item analysis and techniques.
Brown, J. D. (1993). What are the characteristics of natural cloze tests? Language Testing,
10(2), 93-116.
Cambridge ESOL. (2010). IELTS information for candidates. Cambridge, UK: CUP.
Hinkel, E. (2006). Current perspectives on teaching the four skills. TESOL Quarterly, 40(1),
109-131.
Hiser, E. A. (2002). Validity of C-Test cloze for tertiary EFL students in Japan. Proceedings
Hiser, E. A. (2005). Second language assessment, placement, TOEIC and home grown
Dunedin, NZ, on the construct validity of C-Tests in the role of placement for
html
Irvine, P., Atai, P., & Oller, J. W. (1974). Cloze, dictation, and the Test of English as a
Klein-Braley, C. & Raatz, E. (1984). A survey on the C-test. Language Testing, 1(2) 134-
146.
Klein-Braley, C. (1985). A close-up on the C-test: A study in the construct validation of
Raatz, U. & Klein-Braley, C. (1984). A survey of research on the C-test. Language Testing,
Raatz, U. (1985). Better theory for better tests? Language Testing, 2(1) 60-75.
Taylor, W.L. (1953) Cloze procedure: A new tool for measuring readability. Journalism
Wall, D., Clapham, C., & Alderson, J. C. (1994). Evaluating a placement test. Language
http://ltj.sagepub.com/content/11/3/321