You are on page 1of 18

Your Programme, Your Placement, and IELTS:

Emerging Opportunities in English Language Evaluation

Dr Elizabeth Hiser, SELT


Professional and Continuing Education (PaCE)
Massey University
Email: e.hiser@massey.ac.nz

Abstract

This paper analyses and presents the relationship between the Massey University ESOL

Placement Test (MUPT) used at the centre for Professional and Continuing Education

(PaCE), Massey University, and the International English Language Testing System

(IELTS) bands awarded the same students. It is a continuation of the investigation and

development of the PaCE ESOL Placement Test. The scores between the five sections of the

MUPT – reading, writing, speaking, listening and the C-Test – have been statistically

correlated against the IELTS skill test bands. The results point to construct and criterion/

concurrent validity of the PaCE Placement Test in evaluating the overall proficiency level

of applicants for streaming in the ESOL Programme and the real possibility of developing

placement tests for specific programmes, which reflect the international norms of

professionally standardised English language tests such as TOEFL, TOEIC, or IELTS.

Introduction

Context

The ESOL Programme at Massey University is one of five divisions within PaCE. The

ESOL programme/division has four levels of language instruction, which are sometimes

streamed into higher and lower cohorts within the level, based on a student’s Placement Test

(MUPT) score/profile. This placement test was developed and piloted to confirm both

reliability, and construct validity, and provides a profile of English language ability in the

four skills and general proficiency. These results have been reported elsewhere (Hiser, 2010,
2005, 2002). Placement may also be influenced by officially reported IELTS scores,

although a formal link has yet to be established.

Class size for each level varies from six to fifteen students including both genders, a variety

of international origins, various cultural backgrounds and a range of first language

educational achievements. Each cohort of students has a mixture of goals, needs, strengths

and weaknesses. They may also include a mixture of motivation levels, both intrinsic and

extrinsic.

The variety of student goals includes improving general English for residency in New

Zealand, gaining immigration visa requirements, entering tertiary academic studies at

Massey University at both undergraduate and postgraduate levels. The programme is

delivered in modules of eight weeks with summative assessments administered during

Weeks Four and Eight. Additionally, there is usually a weekly on-going assessment attached

to each module which focuses on a specific skill.

Purpose of Study

Most of the students, whether continuing studies in English or not, are concerned with

IELTS evaluation, it being the high-stakes test of English language proficiency in the

majority of European countries, or English-speaking programmes in many Asian countries,

particularly those in the Commonwealth. This paper explores the possibility of establishing a

statistical relationship or correspondence between the programme-centred MUPT and

IELTS. Such an instrument would provide a generally predictive evaluation for IELTS

achievement when students have not yet taken the test. The research proposed would

determine if there is a range of scores in parallel English skills components of each test or

overall scores that can be used in general evaluation of students’ language preparedness for

tertiary study. A serious statistical study would also further validate the C-Test section of the
PaCE Placement Test in reporting general proficiency and the skills profile for entrants into

the ESOL programme.

Literature Review

The results of the search for work similar to this project was surprisingly limited in the

literature, also noted in Wall, Clapham and Alderson (1994). This lack of informed studies

continues today. Their study was weakened by the lack of language proficiency data from a

specific, valid, or relevant other tests to which they could compare all students.

Relatively few students presented test evidence for their language proficiency in

advance of their arrival at Lancaster. What test data were available related to a

number of different tests, some of which were of unknown validity or relevance to

studying in an academic setting. Other statements of language proficiency

(references, students’ own assertions) were both too varied, and of too uncertain

validity to be usable (p.341).

This study is better able to provide this information for the sample as there were official

IELTS report documents for each student – that being the basis for major variables used in

calculations and analyses below – in addition to their MUPT scores.

The C-Test

The cloze procedure was developed as a measure of readability with native speakers (Taylor,

1953). It later was adapted for use as a measure of ESL/EFL proficiency (Darnell, 1970;

Brown, 1983, 1988, 1993; Irvine, Atai, & Oller, 1974). In light of criticism it was modified

by Klein-Braley and Raatz (1984). Their new form became known as a C-Test. It involves

deleting half of each alternating word in a passage, which is offered for test takers to restore

and thereby demonstrate their degree of language proficiency. The remaining letters after
cutting the last half of the words are allowed on the test as hints or clues to the full word

required in completing the passage. The C-test procedure was developed to answer the

psychometric problems of cloze testing, and has been purported to be an empirically and

theoretically valid measure of language proficiency (Raatz & Klein-Braley, 1984; Klein-

Braley, 1985; Klein-Braley & Raatz, 1984, 1985; Raatz, 1985). A major factor in the

following study, the C-Tests given here confirm their history of reliability and validity

adding to the overall discussion of the relationship between the local placement test and the

IELTS.

Methodology

Approach

Since a statistical approach was decided upon, the following sets of variables and data were

selected, and collected for analysis:

PaCE Placement Test Scores (MUPT)

 P-Total: Overall score (0~100) possible scores

 P-Reading Component (20) possible scores

 P-Speaking Component (20) possible scores

 P-Writing Task (20) possible scores

 P-Listening Component (20) possible scores, total score of 80 for skills.

 C-Test (20) total possible score of 20 for proficiency.

IELTS Scores

 I-Total: Overall Band Score (0~9) including half bands = 18

possible scores

 I-Speaking Component (0~9) 18 possible scores

 I-Reading Component (0~9) 18 possible scores


 I-Writing Task (0~9) 18 possible scores

 I-Listening Component (0~9) 18 possible scores, total score of 72

for skills.

These scores show a range of categories in evaluation of 80 for the MUPT (four skills

categories of 20 possible scores), and 72 for the IELTS (four skills categories plus half band

evaluations totalling 18 possible scores) (Cambridge ESOL, 2010). This indicates a slightly

finer skills’ evaluation on the MUPT. The C-Test is assumed a general English language

proficiency test in its own right and is not included in the comparison of skills. It will be

shown, as expected, to have a relationship with overall proficiency scores on the two

instruments. In addition to individual skills tests, the Overall Band Score (OBS) is

considered the general proficiency score for the IELTS. The issue of general proficiency

versus individual skill scores is obviously a different, but related question, assuming

acceptance of the idea that individual skills make up, or are components of, general

proficiency.

The Sample

The sample chosen for the study included (N=) 45 (+3) students at PaCE. The sample size

was limited by the number of students at PaCE for which both the IELTS’ scores and MUPT

scores were available. The Placement Test for three students with IELTS scores were unable

to be located. The ethnic groups among the sample were Arabic, Chinese, and Japanese. The

IELTS scores are confidential and this sample is also limited by use of only scores that were

voluntarily offered, not solicited, and verbal permission given for use in the study. The

majority were young adults working toward undergraduate entry to Massey University by

IELTS requirements – 6.0 overall band score. There was a small percentage working toward

IELTS 6.5 and higher, for postgraduate work. The gender breakdown was not considered an

important aspect of the study, but overall it reflected the gender represented at PaCE, which
is approximately one to one. The analysis of the variables for comparison employed

Pearson’s two-tailed correlations, a more conservative calculation for interval variables than

the Spearman’s analysis.

Results

Variable Descriptives

The two main variables to be considered were the overall scores for each test – the I-total

and the P-total. The IELTS’ band scores as shown in Figure 1clearly indicate the level of the

students in the programme and the goals of the vast majority – the mean band score was 4.8

working to achieve 6.0—entry into the university at the undergraduate level. The few higher

level students in the study obviously (the I-total and the P-total) appear at the top end of the

curves for each.

The Placement Test (MUPT) score totals (see Figure 2) also represent the range of entry

scores for the programme, the mean score being approximately 40 out of 100 for this

sample. The categories between 50 and 70 in the histogram again probably represent the few

higher level students (4) in the programme who are looking for entry into postgraduate work

with a requirement of 6.5 or better on IELTS. A midpoint score, or mean of 50, might have

been expected had students achieving that level not gone directly into undergraduate

university studies and programmes. The mode of 50~55 probably shows the cohort in

general is made up of students just below university entry (IELTS 6.0) who are making a

push to reach that level. Note on Figure 1 there appears only one student already scoring 6.0

on IELTS in the sample.

Figure 1: IELTS Overall Band Scores Figure 2: MUPT Total Scores

Mean = 4.8
Stdev. = 0.67
N = 48
Mean = 40.4
Stdev = 12.4
N = 45

A new variable for the MUPT scores was created, MUPT.80, which excludes C-Test scores

and represents the total placement test score of the four skills which more clearly parallels

the structure of the IELTS OBS. Hereafter the C-Test score will be considered parallel to the

overall proficiency test scores, and an investigation of the new variable’s performance in

relationship to other general proficiency variables will be checked.

The MUPT scores are a total of five components of twenty points each, four of which

parallel the IELTS skills scores, but it also includes a fifth section measuring general

proficiency by use of a C-Test. Including the C-Test, total possible score is 100 points.

These five 0~20 scores allow the creation of a profile for each student showing strong and

weak areas/skills in proficiency that can be addressed directly in instruction.

A histogram of the C-Test scores since they also measure general proficiency (Figure 3)

provides a second comparison to the chart of IELTS OBS used here in Figure 1. This chart

indicates the central tendency of the sample to be slightly lower (the mean is 7.33 out of a

possible 20) than the range of possible scores suggest would be appropriate – a midpoint

score of 10.0. This shows a direct reflection of both the MUPT total scores and the IELTS

OBS for the sample. The standard deviation for the C-Test is 3.2 for a sample of 48.

Figure 3: Distribution of C-Test Scores Figure 4: Histogram of MUPT.80 Scores


Figure 4 shows the distribution of the new variable which again shows a mean score (33)

slightly below an expected score of 40, (N=48). The parallel distribution of the four

variables is not accidental, and validates the results of each.

Results

Analyses

Initially to look at the relationship between the MUPT and IELTS scores, a pictorial cross

tabulation (Figure 5) was developed between the two total scores (IELTS OBS and the

MUPT total). There appears to be good clustering of scores for both test results between

IELTS Bands 4.5/5.0 and the MUPT scores 20 through 50 as illustrated below. Additionally

for the reasons stated above concerning general proficiency scores, scatterplots of the C-test

scores on the MUPT and the IELTS OBS were also created (see Figures 6 and 7).

Figure 5: Crosstabulations of Test Results


These results are indicative of the C-Test’s general ability to predict IELTS Bands. The

actual correlation value upon which the IELTS’ scatterplot is developed was a highly

significant value of 0.465 for a sample of 45 (see Figure 6). The correlation value (ρ) for the

MUPT C-Test and MUPT.80 is 0.597, also with a highly significant value for a sample of

45. The relationship between the C-test and total score for the MUPT (.768**) is not

unexpected as it is a component of the total MUPT variable P.total.

Figure 6: Scatterplot of C-Test Figure 7: Scatterplot of C-Test

and IELTS OBS (ρ=.375**) and MUPT.80 Total (ρ=.597**)

To look more closely at the internal component skill scores within each test, two sets of

Pearson’s correlations were calculated, one for each examination. These calculations again

point to construct validity for each instrument, since the internal correlations between the

components of each test show good or strong relationships all with highly significant (P ≤
0.01) results. See the first set of correlations for the IELTS skills’ tests in Figure 8. Figure 9

has the values for the internal components of the MUPT.

The relationships between each IELTS skill and the OBS are all moderately strong with a

range between 0.691 and 0.875. This would indicate that collectively for this sample,

reading skill was contributing the least to the OBS sample total, while listening skill was

contributing the most (see the last row of Figure 8). This is also a general indication that

overall, reading skill is the poorest contributor of the four skills measured by the test for

students of English language.

Figure 8: Internal Relationships of IELTS Components

OBS

This table of correlations, Figure 8, also reflects probable construct validity of the IELTS.

Looking at the correlations for the internal components – again all highly significant – there

is a range of values between 0.361 and 0.640. These are weak to moderate relationships with

the poorest value between reading skill and speaking ability. The strongest relationship
seems to be between writing and listening skills. Neither of these relationships correspond

intuitively (Henkle, 2006).

Figure 9: Internal Relationships of MUPT Components

MUPT

Figure 9, showing correlations for the MUPT components, is parallel to the IELTS table in

Figure 8, but possibly indicates stronger construct validity as the range of values for the

correlations between skills and the total score is higher (0.688 to 0.833) (see the first row of

Figure 9). This may be because the MUPT, as mentioned above, has a greater number of

scoring categories for students (80) and is therefore a somewhat more fine-grained measure

of proficiency and skill than the IELTS, which has fewer scores or categories (72) with

which to designate a student’s ability. The range of correlations for the internal components/

skills for MUPT is also slightly larger with only the lowest value between writing and

listening (0.357) being less than the lowest value between speaking and reading for the

IELTS (0.361). Internal bi-variate correlations for the MUPT range from 0.357 to 0.680

between reading and writing skills. These scores are more intuitively correct than the values

for IELTS. Reading and writing skills would seem to be more naturally related than writing
and listening scores on the IELTS, the logical match between skills being R/W and L/S

(Hinkel, 2006).

A set of correlations between the general proficiency scores or total skills’ scores indicates

probable criterion validity among tests which are directed independently at evaluation of

general proficiency. The relationship between the MUPT (p.total) and the MUPT.80 score is

naturally higher than the others (0.768, calculated but not shown) as the C-Test values were

included within the variable p.total. The IELTS’ OBS in comparison with the other three

scores are of main concern and provide low to moderate coefficients of 0.375 for the C-test,

0.473 for the MUPT.80, and 0.482 for the p.total, all scores being highly significant two-

tailed bi-variate analyses (Figure 10).

Figure 10: Correlations on General Proficiency Scores

MUPT
IELTS
C.TEST MUPT.80 p.total
i.total Pearson Correlation .375** .473** .482**
Sig. (2-tailed) .009 .001 .001
(OBS) N 48 48 48

Finally, detailed analyses of the correlations between the individual skill scores across tests

were calculated. Listening skills for the two instruments compare favourably – highly

significant (P ≤ 0.01) and moderately strong (0.506) (see Figure 11).

Figure 11: Listening Score Correlations

MUPT IELTS
Listening
listening listening
p.listen Pearson Correlation 1 .506**
Sig. (2-tailed) .001
N 45 42
i.listen Pearson Correlation .506** 1
Sig. (2-tailed) .001
N 42 48
The speaking tasks on the two tests show a weak relationship (0.325), which is still

significant (P ≤ .05) at the 0.036 level (Figure 12).

Writing tasks for the two examinations are again moderate in strength and remain highly

significant with P = 0.008 (see Figure 13).

The final skill comparison between reading ability, as evaluated on the two instruments,

shows an unexpected reversal of relationships (Figure 14). There is an inverse (-0.005)

coefficient shown between the two reading tests and an amazingly low significance—974

chances out of 1,000 that this phenomenon occurs by chance.

Figure 12: Speaking Task Correlation of Scores

MUPT IELTS
Speaking
Speaking Speaking
p.speak Pearson Correlation 1 .325*
Sig. (2-tailed) .036
N 45 42
i.speak Pearson Correlation .325* 1
Sig. (2-tailed) .036
N 42 48
Figure 13: Correlations: Writing Task Scores

MUPT IELTS
Writing
writing writing
p.write Pearson Correlation 1 .402**
Sig. (2-tailed) .008
N 45 42
i.write Pearson Correlation .402** 1
Sig. (2-tailed) .008
N 42 48

Figure 14: Correlations: Reading Test Scores

MUPT IELTS
Reading
reading Reading
p.read Pearson Correlation 1 -.005
Sig. (2-tailed) .974
N 45 42
i.read Pearson Correlation -.005 1
Sig. (2-tailed) .974
N 42 48

This surprising result implies that the reading tasks are evaluating different skills and may

be due to either the complex cognitive nature of reading, or the type of tasks used on each

test instrument to evaluate reading skills. IELTS task types are selected from a list of

possibilities that includes: multiple choice, identifying information from the text

(true/false/not given), identifying writers’ views (yes/no/not given), matching information

from the text to distractors of various sorts, matching headings, i.e. grouping or classifying

details with specific categories, matching features/statements with a list of options, matching
sentence endings, sentence completion with short phrases, summary of text, completion of

an outline of the text, flow chart completion, or finally, diagram label completion (IELTS

Academic Test, 2013) Cambridge ESOL, 2010). This almost overwhelming list of skills

involved in reading, only a small selection of which may be on any particular IELTS test,

can hardly be expected to compare to the simple choices of the MUPT to evaluate

comprehension with multiple choice distractors, matching information with multiple choice

distractors, completing the text itself with possible phrases, and completing sentences with

short phrases created by rewording the text. These different approaches to evaluating

reading obviously are measuring sub-skills of a different nature, and hence the disparate

scores on the reading sections.

This lack of correspondence on one component of the tests also reduces the overall criterion

validity of the otherwise reasonably parallel examinations. This correspondence is presently

acceptable and useful, but obviously, if this component were modified, the two tests would

demonstrate even greater concurrent validity, as every other aspect of the tests correlates

reasonably and seems parallel in reporting results.

Conclusion

A number of observations and interpretations can be offered from the data presentation or

the analyses calculated above.

 The C-test being used within the PaCE ESOL Placement Test (MUPT), 1) again

points to construct validity of the examination, and 2) has a highly significant

relationship in evaluation of general English language proficiency measured by the

IELTS examination.

 All skills sections and total scores on both exams show significant relationships with

each other, except the reading components which do contribute to internal cohesion

independently on each individual test, but just not between each other.
 The internal construct validity of both the MUPT and the IELTS tests appears

satisfactory with the stronger cohesion of the components produced by the MUPT.

 Among the internal relationships of the two tests, the reading skill evaluation on the

IELTS is the weakest among the four skills, yet the reading skill evaluation on the

MUPT is the strongest contributor to the total score there.

 Internal relationships with total scores (construct validity) are stronger on the MUTP

for all skills components except the listening sections, showing the MUPT to be

internally more cohesive.

 The MUPT seems better at profiling the student skills as it has a slightly finer-

grained discrimination within both component scores and overall scores, therefore,

somewhat greater accuracy of measurement.

 The MUPT, particularly the C-Test component, is more than adequate as a general

predictor of IELTS scores. A regression formula could be created to provide a more

specific coefficient for this prediction but is beyond the scope of this paper.

Overall, the two examinations appear to evaluate proficiency accurately in a robust manner.

The only reservation would be with regard to reading, which while it did not perform well in

relation to the IELTS reading component, does perform well within the set of MUPT

variables. Profiles provided by both tests are informative in relationship to the overall

scores, but seem slightly more accurate on the MUPT than the IELTS. Most institutions

could benefit from developing their own placement test, particularly if the students are

concerned about IELTS results rather than just learning English. It would provide not only a

guide for streaming students, but also an instructional guide in identifying

strengths/weaknesses of individual language learners in student needs analysis.


References

Brown, J. D. (1983). A closer look at cloze: Validity and reliability. In Oller, J. W. (Ed.)

Issues in language testing (pp. 237-250). Rowley, MA: Newbury House.

Brown, J. D. (1988). Tailored cloze: Improved with classical item analysis and techniques.

Language Testing, 5(1), 19-31.

Brown, J. D. (1993). What are the characteristics of natural cloze tests? Language Testing,

10(2), 93-116.

Cambridge ESOL. (2010). IELTS information for candidates. Cambridge, UK: CUP.

Retrieved from www.ielts.org

Darnell, D. K. (1970). Clozentropy: A procedure for testing English language proficiency of

foreign students. Speech Monographs, 37, 36-46.

Hinkel, E. (2006). Current perspectives on teaching the four skills. TESOL Quarterly, 40(1),

109-131.

Hiser, E. A. (2002). Validity of C-Test cloze for tertiary EFL students in Japan. Proceedings

of the JACET Annual Conference, Shizuoka, Japan.

Hiser, E. A. (2005). Second language assessment, placement, TOEIC and home grown

vegetables. Paper presented at the ALANZ Symposium 2005: Second Language

Assessment and Second Language Learning, Victoria University, Wellington, NZ.

Hiser, E. A. (2010). Mediating placement: Using C-tests. Presentation given at CLESOL

Dunedin, NZ, on the construct validity of C-Tests in the role of placement for

English language proficiency.

IELTS Academic Test. (2013). Retrieved from ww.ieltshelpnow.com/ielts_academic_test.

html

Irvine, P., Atai, P., & Oller, J. W. (1974). Cloze, dictation, and the Test of English as a

Foreign Language. Language Learning, 24, 245-252.

Klein-Braley, C. & Raatz, E. (1984). A survey on the C-test. Language Testing, 1(2) 134-

146.
Klein-Braley, C. (1985). A close-up on the C-test: A study in the construct validation of

authentic tests. Language Testing, 2(1) 76-104.

Raatz, U. & Klein-Braley, C. (1984). A survey of research on the C-test. Language Testing,

1(2); 134-146 or retrieve from http://ltj.sagepub.com/content/1/2/134.short

Raatz, U. (1985). Better theory for better tests? Language Testing, 2(1) 60-75.

Taylor, W.L. (1953) Cloze procedure: A new tool for measuring readability. Journalism

Quarterly 30, 415-33.

Wall, D., Clapham, C., & Alderson, J. C. (1994). Evaluating a placement test. Language

Testing. 11:321~344. DOI: 10.1177/026553229401100305. Retrieved from

http://ltj.sagepub.com/content/11/3/321

You might also like