Your Programme, Your Placement, and IELTS: Emerging Opportunities in English Language Evaluation

Your Programme, Your Placement, and IELTS:
Emerging Opportunities in English Language Evaluation
Dr Elizabeth Hiser, SELT

Professional and Continuing Education (PaCE)
Massey University
Email: e.hiser@massey.ac.nz
Abstract
This paper analyses and presents the relationship between the Massey University ESOL
Placement Test (MUPT) used at the centre for Professional and Continuing Education
(PaCE), Massey University, and the International English Language Testing System
(IELTS) bands awarded the same students. It is a continuation of the investigation and
development of the PaCE ESOL Placement Test. The scores between the five sections of the
MUPT – reading, writing, speaking, listening and the C-Test – have been statistically
correlated against the IELTS skill test bands. The results point to construct and criterion/
concurrent validity of the PaCE Placement Test in evaluating the overall proficiency level
of applicants for streaming in the ESOL Programme and the real possibility of developing
placement tests for specific programmes, which reflect the international norms of
professionally standardised English language tests such as TOEFL, TOEIC, or IELTS.
Introduction
Context
The ESOL Programme at Massey University is one of five divisions within PaCE. The
ESOL programme/division has four levels of language instruction, which are sometimes
streamed into higher and lower cohorts within the level, based on a student’s Placement Test
(MUPT) score/profile. This placement test was developed and piloted to confirm both
reliability, and construct validity, and provides a profile of English language ability in the
four skills and general proficiency. These results have been reported elsewhere (Hiser, 2010,
2005, 2002). Placement may also be influenced by officially reported IELTS scores,
although a formal link has yet to be established.
Class size for each level varies from six to fifteen students including both genders, a variety
of international origins, various cultural backgrounds and a range of first language
educational achievements. Each cohort of students has a mixture of goals, needs, strengths
and weaknesses. They may also include a mixture of motivation levels, both intrinsic and
extrinsic.
The variety of student goals includes improving general English for residency in New
Zealand, gaining immigration visa requirements, entering tertiary academic studies at
Massey University at both undergraduate and postgraduate levels. The programme is
delivered in modules of eight weeks with summative assessments administered during
Weeks Four and Eight. Additionally, there is usually a weekly on-going assessment attached
to each module which focuses on a specific skill.
Purpose of Study
Most of the students, whether continuing studies in English or not, are concerned with
IELTS evaluation, it being the high-stakes test of English language proficiency in the
majority of European countries, or English-speaking programmes in many Asian countries,
particularly those in the Commonwealth. This paper explores the possibility of establishing a
statistical relationship or correspondence between the programme-centred MUPT and
IELTS. Such an instrument would provide a generally predictive evaluation for IELTS
achievement when students have not yet taken the test. The research proposed would
determine if there is a range of scores in parallel English skills components of each test or
overall scores that can be used in general evaluation of students’ language preparedness for
tertiary study. A serious statistical study would also further validate the C-Test section of the
PaCE Placement Test in reporting general proficiency and the skills profile for entrants into
the ESOL programme.
Literature Review
The results of the search for work similar to this project was surprisingly limited in the
literature, also noted in Wall, Clapham and Alderson (1994). This lack of informed studies
continues today. Their study was weakened by the lack of language proficiency data from a
specific, valid, or relevant other tests to which they could compare all students.
Relatively few students presented test evidence for their language proficiency in
advance of their arrival at Lancaster. What test data were available related to a
number of different tests, some of which were of unknown validity or relevance to
studying in an academic setting. Other statements of language proficiency
(references, students’ own assertions) were both too varied, and of too uncertain
validity to be usable (p.341).
This study is better able to provide this information for the sample as there were official
IELTS report documents for each student – that being the basis for major variables used in
calculations and analyses below – in addition to their MUPT scores.
The C-Test
The cloze procedure was developed as a measure of readability with native speakers (Taylor,
1953). It later was adapted for use as a measure of ESL/EFL proficiency (Darnell, 1970;
Brown, 1983, 1988, 1993; Irvine, Atai, & Oller, 1974). In light of criticism it was modified
by Klein-Braley and Raatz (1984). Their new form became known as a C-Test. It involves
deleting half of each alternating word in a passage, which is offered for test takers to restore
and thereby demonstrate their degree of language proficiency. The remaining letters after
cutting the last half of the words are allowed on the test as hints or clues to the full word
required in completing the passage. The C-test procedure was developed to answer the
psychometric problems of cloze testing, and has been purported to be an empirically and
theoretically valid measure of language proficiency (Raatz & Klein-Braley, 1984; Klein-
Braley, 1985; Klein-Braley & Raatz, 1984, 1985; Raatz, 1985). A major factor in the
following study, the C-Tests given here confirm their history of reliability and validity
adding to the overall discussion of the relationship between the local placement test and the
IELTS.
Methodology
Approach
Since a statistical approach was decided upon, the following sets of variables and data were
selected, and collected for analysis:
PaCE Placement Test Scores (MUPT)
 P-Total: Overall score (0~100) possible scores
 P-Reading Component (20) possible scores
 P-Speaking Component (20) possible scores
 P-Writing Task (20) possible scores
 P-Listening Component (20) possible scores, total score of 80 for skills.
 C-Test (20) total possible score of 20 for proficiency.
IELTS Scores
 I-Total: Overall Band Score (0~9) including half bands = 18
possible scores
 I-Speaking Component (0~9) 18 possible scores
 I-Reading Component (0~9) 18 possible scores

 I-Writing Task (0~9) 18 possible scores
 I-Listening Component (0~9) 18 possible scores, total score of 72
for skills.
These scores show a range of categories in evaluation of 80 for the MUPT (four skills
categories of 20 possible scores), and 72 for the IELTS (four skills categories plus half band
evaluations totalling 18 possible scores) (Cambridge ESOL, 2010). This indicates a slightly
finer skills’ evaluation on the MUPT. The C-Test is assumed a general English language
proficiency test in its own right and is not included in the comparison of skills. It will be
shown, as expected, to have a relationship with overall proficiency scores on the two
instruments. In addition to individual skills tests, the Overall Band Score (OBS) is
considered the general proficiency score for the IELTS. The issue of general proficiency
versus individual skill scores is obviously a different, but related question, assuming
acceptance of the idea that individual skills make up, or are components of, general
proficiency.
The Sample
The sample chosen for the study included (N=) 45 (+3) students at PaCE. The sample size
was limited by the number of students at PaCE for which both the IELTS’ scores and MUPT
scores were available. The Placement Test for three students with IELTS scores were unable
to be located. The ethnic groups among the sample were Arabic, Chinese, and Japanese. The
IELTS scores are confidential and this sample is also limited by use of only scores that were
voluntarily offered, not solicited, and verbal permission given for use in the study. The
majority were young adults working toward undergraduate entry to Massey University by
IELTS requirements – 6.0 overall band score. There was a small percentage working toward
IELTS 6.5 and higher, for postgraduate work. The gender breakdown was not considered an
important aspect of the study, but overall it reflected the gender represented at PaCE, which
is approximately one to one. The analysis of the variables for comparison employed
Pearson’s two-tailed correlations, a more conservative calculation for interval variables than
the Spearman’s analysis.
Results
Variable Descriptives
The two main variables to be considered were the overall scores for each test – the I-total
and the P-total. The IELTS’ band scores as shown in Figure 1clearly indicate the level of the
students in the programme and the goals of the vast majority – the mean band score was 4.8
working to achieve 6.0—entry into the university at the undergraduate level. The few higher
level students in the study obviously (the I-total and the P-total) appear at the top end of the
curves for each.
The Placement Test (MUPT) score totals (see Figure 2) also represent the range of entry
scores for the programme, the mean score being approximately 40 out of 100 for this
sample. The categories between 50 and 70 in the histogram again probably represent the few
higher level students (4) in the programme who are looking for entry into postgraduate work
with a requirement of 6.5 or better on IELTS. A midpoint score, or mean of 50, might have
been expected had students achieving that level not gone directly into undergraduate
university studies and programmes. The mode of 50~55 probably shows the cohort in
general is made up of students just below university entry (IELTS 6.0) who are making a
push to reach that level. Note on Figure 1 there appears only one student already scoring 6.0
on IELTS in the sample.
Figure 1: IELTS Overall Band Scores Figure 2: MUPT Total Scores
Mean = 4.8
Stdev. = 0.67
N = 48
Mean = 40.4
Stdev = 12.4
N = 45
A new variable for the MUPT scores was created, MUPT.80, which excludes C-Test scores
and represents the total placement test score of the four skills which more clearly parallels
the structure of the IELTS OBS. Hereafter the C-Test score will be considered parallel to the
overall proficiency test scores, and an investigation of the new variable’s performance in
relationship to other general proficiency variables will be checked.
The MUPT scores are a total of five components of twenty points each, four of which
parallel the IELTS skills scores, but it also includes a fifth section measuring general
proficiency by use of a C-Test. Including the C-Test, total possible score is 100 points.
These five 0~20 scores allow the creation of a profile for each student showing strong and
weak areas/skills in proficiency that can be addressed directly in instruction.
A histogram of the C-Test scores since they also measure general proficiency (Figure 3)
provides a second comparison to the chart of IELTS OBS used here in Figure 1. This chart
indicates the central tendency of the sample to be slightly lower (the mean is 7.33 out of a
possible 20) than the range of possible scores suggest would be appropriate – a midpoint
score of 10.0. This shows a direct reflection of both the MUPT total scores and the IELTS
OBS for the sample. The standard deviation for the C-Test is 3.2 for a sample of 48.
Figure 3: Distribution of C-Test Scores Figure 4: Histogram of MUPT.80 Scores

Figure 4 shows the distribution of the new variable which again shows a mean score (33)
slightly below an expected score of 40, (N=48). The parallel distribution of the four
variables is not accidental, and validates the results of each.
Results
Analyses
Initially to look at the relationship between the MUPT and IELTS scores, a pictorial cross
tabulation (Figure 5) was developed between the two total scores (IELTS OBS and the
MUPT total). There appears to be good clustering of scores for both test results between
IELTS Bands 4.5/5.0 and the MUPT scores 20 through 50 as illustrated below. Additionally
for the reasons stated above concerning general proficiency scores, scatterplots of the C-test
scores on the MUPT and the IELTS OBS were also created (see Figures 6 and 7).
Figure 5: Crosstabulations of Test Results

These results are indicative of the C-Test’s general ability to predict IELTS Bands. The
actual correlation value upon which the IELTS’ scatterplot is developed was a highly
significant value of 0.465 for a sample of 45 (see Figure 6). The correlation value (ρ) for the
MUPT C-Test and MUPT.80 is 0.597, also with a highly significant value for a sample of
45. The relationship between the C-test and total score for the MUPT (.768**) is not
unexpected as it is a component of the total MUPT variable P.total.
Figure 6: Scatterplot of C-Test Figure 7: Scatterplot of C-Test
and IELTS OBS (ρ=.375**) and MUPT.80 Total (ρ=.597**)
To look more closely at the internal component skill scores within each test, two sets of
Pearson’s correlations were calculated, one for each examination. These calculations again
point to construct validity for each instrument, since the internal correlations between the
components of each test show good or strong relationships all with highly significant (P ≤
0.01) results. See the first set of correlations for the IELTS skills’ tests in Figure 8. Figure 9
has the values for the internal components of the MUPT.
The relationships between each IELTS skill and the OBS are all moderately strong with a
range between 0.691 and 0.875. This would indicate that collectively for this sample,
reading skill was contributing the least to the OBS sample total, while listening skill was
contributing the most (see the last row of Figure 8). This is also a general indication that
overall, reading skill is the poorest contributor of the four skills measured by the test for
students of English language.
Figure 8: Internal Relationships of IELTS Components
OBS
This table of correlations, Figure 8, also reflects probable construct validity of the IELTS.
Looking at the correlations for the internal components – again all highly significant – there
is a range of values between 0.361 and 0.640. These are weak to moderate relationships with
the poorest value between reading skill and speaking ability. The strongest relationship
seems to be between writing and listening skills. Neither of these relationships correspond
intuitively (Henkle, 2006).
Figure 9: Internal Relationships of MUPT Components
MUPT
Figure 9, showing correlations for the MUPT components, is parallel to the IELTS table in
Figure 8, but possibly indicates stronger construct validity as the range of values for the
correlations between skills and the total score is higher (0.688 to 0.833) (see the first row of
Figure 9). This may be because the MUPT, as mentioned above, has a greater number of
scoring categories for students (80) and is therefore a somewhat more fine-grained measure
of proficiency and skill than the IELTS, which has fewer scores or categories (72) with
which to designate a student’s ability. The range of correlations for the internal components/
skills for MUPT is also slightly larger with only the lowest value between writing and
listening (0.357) being less than the lowest value between speaking and reading for the
IELTS (0.361). Internal bi-variate correlations for the MUPT range from 0.357 to 0.680
between reading and writing skills. These scores are more intuitively correct than the values
for IELTS. Reading and writing skills would seem to be more naturally related than writing
and listening scores on the IELTS, the logical match between skills being R/W and L/S
(Hinkel, 2006).
A set of correlations between the general proficiency scores or total skills’ scores indicates
probable criterion validity among tests which are directed independently at evaluation of
general proficiency. The relationship between the MUPT (p.total) and the MUPT.80 score is
naturally higher than the others (0.768, calculated but not shown) as the C-Test values were
included within the variable p.total. The IELTS’ OBS in comparison with the other three
scores are of main concern and provide low to moderate coefficients of 0.375 for the C-test,
0.473 for the MUPT.80, and 0.482 for the p.total, all scores being highly significant two-
tailed bi-variate analyses (Figure 10).
Figure 10: Correlations on General Proficiency Scores
MUPT
IELTS
C.TEST MUPT.80 p.total
i.total Pearson Correlation .375** .473** .482**
Sig. (2-tailed) .009 .001 .001
(OBS) N 48 48 48
Finally, detailed analyses of the correlations between the individual skill scores across tests
were calculated. Listening skills for the two instruments compare favourably – highly
significant (P ≤ 0.01) and moderately strong (0.506) (see Figure 11).
Figure 11: Listening Score Correlations
MUPT IELTS
Listening
listening listening
p.listen Pearson Correlation 1 .506**
Sig. (2-tailed) .001
N 45 42
i.listen Pearson Correlation .506** 1
N 42 48
The speaking tasks on the two tests show a weak relationship (0.325), which is still
significant (P ≤ .05) at the 0.036 level (Figure 12).
Writing tasks for the two examinations are again moderate in strength and remain highly
significant with P = 0.008 (see Figure 13).
The final skill comparison between reading ability, as evaluated on the two instruments,
shows an unexpected reversal of relationships (Figure 14). There is an inverse (-0.005)
coefficient shown between the two reading tests and an amazingly low significance—974
chances out of 1,000 that this phenomenon occurs by chance.
Figure 12: Speaking Task Correlation of Scores
MUPT IELTS
Speaking
Speaking Speaking
p.speak Pearson Correlation 1 .325*
N 45 42
i.speak Pearson Correlation .325* 1
N 42 48
Figure 13: Correlations: Writing Task Scores
MUPT IELTS
Writing
writing writing
p.write Pearson Correlation 1 .402**
N 45 42
i.write Pearson Correlation .402** 1
N 42 48
Figure 14: Correlations: Reading Test Scores
MUPT IELTS
Reading
reading Reading
p.read Pearson Correlation 1 -.005
N 45 42
i.read Pearson Correlation -.005 1
N 42 48
This surprising result implies that the reading tasks are evaluating different skills and may
be due to either the complex cognitive nature of reading, or the type of tasks used on each
test instrument to evaluate reading skills. IELTS task types are selected from a list of
possibilities that includes: multiple choice, identifying information from the text
(true/false/not given), identifying writers’ views (yes/no/not given), matching information
from the text to distractors of various sorts, matching headings, i.e. grouping or classifying
details with specific categories, matching features/statements with a list of options, matching
sentence endings, sentence completion with short phrases, summary of text, completion of
an outline of the text, flow chart completion, or finally, diagram label completion (IELTS
Academic Test, 2013) Cambridge ESOL, 2010). This almost overwhelming list of skills
involved in reading, only a small selection of which may be on any particular IELTS test,
can hardly be expected to compare to the simple choices of the MUPT to evaluate
comprehension with multiple choice distractors, matching information with multiple choice
distractors, completing the text itself with possible phrases, and completing sentences with
short phrases created by rewording the text. These different approaches to evaluating
reading obviously are measuring sub-skills of a different nature, and hence the disparate
scores on the reading sections.
This lack of correspondence on one component of the tests also reduces the overall criterion
validity of the otherwise reasonably parallel examinations. This correspondence is presently
acceptable and useful, but obviously, if this component were modified, the two tests would
demonstrate even greater concurrent validity, as every other aspect of the tests correlates
reasonably and seems parallel in reporting results.
Conclusion
A number of observations and interpretations can be offered from the data presentation or
the analyses calculated above.
 The C-test being used within the PaCE ESOL Placement Test (MUPT), 1) again
points to construct validity of the examination, and 2) has a highly significant
relationship in evaluation of general English language proficiency measured by the
IELTS examination.
 All skills sections and total scores on both exams show significant relationships with
each other, except the reading components which do contribute to internal cohesion
independently on each individual test, but just not between each other.
 The internal construct validity of both the MUPT and the IELTS tests appears
satisfactory with the stronger cohesion of the components produced by the MUPT.
 Among the internal relationships of the two tests, the reading skill evaluation on the
IELTS is the weakest among the four skills, yet the reading skill evaluation on the
MUPT is the strongest contributor to the total score there.
 Internal relationships with total scores (construct validity) are stronger on the MUTP
for all skills components except the listening sections, showing the MUPT to be
internally more cohesive.
 The MUPT seems better at profiling the student skills as it has a slightly finer-
grained discrimination within both component scores and overall scores, therefore,
somewhat greater accuracy of measurement.
 The MUPT, particularly the C-Test component, is more than adequate as a general
predictor of IELTS scores. A regression formula could be created to provide a more
specific coefficient for this prediction but is beyond the scope of this paper.
Overall, the two examinations appear to evaluate proficiency accurately in a robust manner.
The only reservation would be with regard to reading, which while it did not perform well in
relation to the IELTS reading component, does perform well within the set of MUPT
variables. Profiles provided by both tests are informative in relationship to the overall
scores, but seem slightly more accurate on the MUPT than the IELTS. Most institutions
could benefit from developing their own placement test, particularly if the students are
concerned about IELTS results rather than just learning English. It would provide not only a
guide for streaming students, but also an instructional guide in identifying
strengths/weaknesses of individual language learners in student needs analysis.

References
Brown, J. D. (1983). A closer look at cloze: Validity and reliability. In Oller, J. W. (Ed.)
Issues in language testing (pp. 237-250). Rowley, MA: Newbury House.
Brown, J. D. (1988). Tailored cloze: Improved with classical item analysis and techniques.
Language Testing, 5(1), 19-31.
Brown, J. D. (1993). What are the characteristics of natural cloze tests? Language Testing,
10(2), 93-116.
Cambridge ESOL. (2010). IELTS information for candidates. Cambridge, UK: CUP.
Retrieved from www.ielts.org
Darnell, D. K. (1970). Clozentropy: A procedure for testing English language proficiency of
foreign students. Speech Monographs, 37, 36-46.
Hinkel, E. (2006). Current perspectives on teaching the four skills. TESOL Quarterly, 40(1),
109-131.
Hiser, E. A. (2002). Validity of C-Test cloze for tertiary EFL students in Japan. Proceedings
of the JACET Annual Conference, Shizuoka, Japan.
Hiser, E. A. (2005). Second language assessment, placement, TOEIC and home grown
vegetables. Paper presented at the ALANZ Symposium 2005: Second Language
Assessment and Second Language Learning, Victoria University, Wellington, NZ.
Hiser, E. A. (2010). Mediating placement: Using C-tests. Presentation given at CLESOL
Dunedin, NZ, on the construct validity of C-Tests in the role of placement for
English language proficiency.
IELTS Academic Test. (2013). Retrieved from ww.ieltshelpnow.com/ielts_academic_test.
html
Irvine, P., Atai, P., & Oller, J. W. (1974). Cloze, dictation, and the Test of English as a
Foreign Language. Language Learning, 24, 245-252.
Klein-Braley, C. & Raatz, E. (1984). A survey on the C-test. Language Testing, 1(2) 134-
146.
Klein-Braley, C. (1985). A close-up on the C-test: A study in the construct validation of
authentic tests. Language Testing, 2(1) 76-104.
Raatz, U. & Klein-Braley, C. (1984). A survey of research on the C-test. Language Testing,
1(2); 134-146 or retrieve from http://ltj.sagepub.com/content/1/2/134.short
Raatz, U. (1985). Better theory for better tests? Language Testing, 2(1) 60-75.
Taylor, W.L. (1953) Cloze procedure: A new tool for measuring readability. Journalism
Quarterly 30, 415-33.
Wall, D., Clapham, C., & Alderson, J. C. (1994). Evaluating a placement test. Language
Testing. 11:321~344. DOI: 10.1177/026553229401100305. Retrieved from
http://ltj.sagepub.com/content/11/3/321

Your Programme, Your Placement, and IELTS: Emerging Opportunities in English Language Evaluation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Your Programme, Your Placement, and IELTS: Emerging Opportunities in English Language Evaluation

Uploaded by

Copyright:

Available Formats

Your Programme, Your Placement, and IELTS:

Emerging Opportunities in English Language Evaluation

Dr Elizabeth Hiser, SELT

professionally standardised English language tests such as TOEFL, TOEIC, or IELTS.

although a formal link has yet to be established.

of international origins, various cultural backgrounds and a range of first language

Zealand, gaining immigration visa requirements, entering tertiary academic studies at

Massey University at both undergraduate and postgraduate levels. The programme is

delivered in modules of eight weeks with summative assessments administered during

to each module which focuses on a specific skill.

majority of European countries, or English-speaking programmes in many Asian countries,

statistical relationship or correspondence between the programme-centred MUPT and

the ESOL programme.

number of different tests, some of which were of unknown validity or relevance to

studying in an academic setting. Other statements of language proficiency

validity to be usable (p.341).

calculations and analyses below – in addition to their MUPT scores.

selected, and collected for analysis:

PaCE Placement Test Scores (MUPT)

 P-Total: Overall score (0~100) possible scores

 P-Reading Component (20) possible scores

 P-Speaking Component (20) possible scores

 P-Writing Task (20) possible scores

 P-Listening Component (20) possible scores, total score of 80 for skills.

 C-Test (20) total possible score of 20 for proficiency.

 I-Total: Overall Band Score (0~9) including half bands = 18

 I-Speaking Component (0~9) 18 possible scores

 I-Reading Component (0~9) 18 possible scores

 I-Listening Component (0~9) 18 possible scores, total score of 72

the Spearman’s analysis.

curves for each.

on IELTS in the sample.

Figure 1: IELTS Overall Band Scores Figure 2: MUPT Total Scores

relationship to other general proficiency variables will be checked.

weak areas/skills in proficiency that can be addressed directly in instruction.

Figure 3: Distribution of C-Test Scores Figure 4: Histogram of MUPT.80 Scores

variables is not accidental, and validates the results of each.

Figure 5: Crosstabulations of Test Results

unexpected as it is a component of the total MUPT variable P.total.

Figure 6: Scatterplot of C-Test Figure 7: Scatterplot of C-Test

and IELTS OBS (ρ=.375**) and MUPT.80 Total (ρ=.597**)

has the values for the internal components of the MUPT.

students of English language.

Figure 8: Internal Relationships of IELTS Components

intuitively (Henkle, 2006).

Figure 9: Internal Relationships of MUPT Components

tailed bi-variate analyses (Figure 10).

Figure 10: Correlations on General Proficiency Scores

significant (P ≤ 0.01) and moderately strong (0.506) (see Figure 11).

Figure 11: Listening Score Correlations

significant (P ≤ .05) at the 0.036 level (Figure 12).

significant with P = 0.008 (see Figure 13).

shows an unexpected reversal of relationships (Figure 14). There is an inverse (-0.005)

chances out of 1,000 that this phenomenon occurs by chance.

Figure 12: Speaking Task Correlation of Scores

Figure 14: Correlations: Reading Test Scores

(true/false/not given), identifying writers’ views (yes/no/not given), matching information

scores on the reading sections.

validity of the otherwise reasonably parallel examinations. This correspondence is presently

reasonably and seems parallel in reporting results.

the analyses calculated above.

points to construct validity of the examination, and 2) has a highly significant

relationship in evaluation of general English language proficiency measured by the

and IELTS OBS (ρ=.375) and MUPT.80 Total (ρ=.597)