Validating Self Assessment

Language Testing
http://ltj.sagepub.com
Validating self-reported language proficiency by testing performance in

an immigrant community: the Wellington Indo-Fijians
Nikhat Shameem
Language Testing 1998; 15; 86
DOI: 10.1177/026553229801500104
The online version of this article can be found at:

http://ltj.sagepub.com/cgi/content/abstract/15/1/86
Published by:
http://www.sagepublications.com
Additional services and information for Language Testing can be found at:
Email Alerts: http://ltj.sagepub.com/cgi/alerts
Subscriptions: http://ltj.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.co.uk/journalsPermissions.nav
Citations http://ltj.sagepub.com/cgi/content/refs/15/1/86
Downloaded from http://ltj.sagepub.com at MICHIGAN STATE UNIV LIBRARIES on April 19, 2010
Validating self-reported language
proficiency by testing performance in
an immigrant community: the
Wellington Indo-Fijians
Nikhat Shameem University of Auckland
The Wellington Indo-Fijians are recent immigrants to New Zealand, having arrived
in the country after the 1987 Fiji military coups. A performance test was developed
and implemented to validate self-reported first language proficiency of 35 teen-
agers in this immigrant community. The main drawback of self-report studies is
the likelihood of gathering inaccurate data, particularly if the first language, like
Fiji Hindi in this community, is preliterate and perceived by the community mem-
bers as a lower-status, less useful language than English. The performance test
consisted of an oral interview, a listening comprehension test and a vocabulary
test. The results of the performance test correlated strongly with the self-report
data, thereby demonstrating the validity of the self-report scale. Significant differ-
ences between oral performances and self-reports as well as general trends in data
suggested, however, that the respondents were often reporting their oral Fiji Hindi
ability at a level higher than their judged level of performance.
I Introduction
The first language of the Indo-Fijians is Fiji Hindi (FH) which is an
Overseas Hindi. Various studies of Overseas Hindi in the Indian dias-
pora have shown these languages to be shift-prone and of low status
among its speakers. Although the Overseas Hindi of South Africa,
Trinidad and Guyana are no longer used, the varieties used in Maurit-
ius, Surinam and Fiji survive, despite their low status and the strong
influence of other languages spoken in these countries (Barz and Sie-
gel, 1988). In Fiji, FH is used regularly although only informally as
the main language of intraethnic communication among the Indo-Fiji-
ans. Indo-Fijians prefer to use Shudh Hindi and English on formal
occasions. FH is also used as lingua franca between the Indo-Fijians
and the Fijians in some rural communities (Siegel, 1973).
FH is a preliterate language that originated from plantation contact
Address for correspondence: Dr Nikhat Shameem, Institute of Language Teaching and Learn-
ing, University of Auckland, Private Bag 92019, Auckland, New Zealand; e-mail:
n.shameem얀auckland.ac.nz
Language Testing 1998 15 (1) 86–108 0265-5322(98)LT144OA  1998 Arnold
Nikhat Shameem 87
in Fiji during indenture 1879–1916, when 60 965 Indians were taken

as indentured labourers by the British from India to Fiji.1 About half
of them chose to stay in Fiji at the end of their indenture term
(Gillion, 1962). A majority of the present-day Indo-Fijians are the
descendants of these labourers.
Fiji is an exoglossic nation where English rather than the two ver-
naculars, FH and Fijian (the language of the ‘indigenous’ Fijians) is
used in education, in government and in the economy. The role of
English in Fiji has meant a high English language competence among
Indo-Fijians and commonplace FH–English code-mixing, code-
switching and borrowing (Shameem, 1994: 402).2
Given these background factors, and the powerful anglicizing effect
of immigration, I expected FH to be rapidly shifting to English among
younger Indo-Fijians in Wellington. For the purpose of this article,
language shift is defined as a shift in patterns of language use between
Fiji and New Zealand (NZ). Language loss implies a subsequent loss
of FH proficiency.
To date, NZ studies of language shift in migrant communities have
relied on self-report data (see, for example, Holmes et al., 1993).
However, several writers who have been involved in language shift
research advise caution in interpreting self-report data (see, for
example, Nicholas, 1988; Martin-Jones, 1991). Martin-Jones says that
this is particularly so in bilingual and multilingual communities,
where language use does not necessarily fall into neat little patterns
of complementary distribution across domains (1991: 50). Indeed,
Poplack (1980: 581) suggests that code-switching, rather than rep-
resenting a debasement of linguistic skill, is a sensitive indicator of
bilingual ability and may in fact be the norm in certain speech com-
munities. These observations seemed particularly relevant to this
study of the Wellington Indo-Fijians.
Nicholas (1988), in illustrating that cross-checks on self-reported
data are especially desirable in multilingual communities, gives as an
example a study of British Creole speakers. In this study the quality
and reliability of the data had unavoidable limitations because, like
FH, Creole has an uncertain political and social status, lacks stan-
dardization and is referred to by the speakers themselves as ‘broken’.
It too lacks a name specific to the ‘variety’; the general reference
term is ‘English-based Creole’.
1
On the Kloss (1968) taxonomy of language types which assesses the capability of a language
to serve in a modern technologically developed society.
2
Code-mixing: switching from FH to English within the structure of a sentence. Code-switch-
ing: switching between FH and English sentences and using FH question tags in English sen-
tences (also see Poplack, 1980).
88 Validating self-reported language proficiency
In this study a performance test was used to validate aural and oral
self-report scales used to determine the Fiji Hindi proficiency of 35
new immigrants in the Wellington Indo-Fijian community.
II Respondents
There are an estimated 16 000 Indo-Fijians residing in NZ (Shameem,
1995). It is difficult to estimate the number resident in Wellington.
Thirty-five young Indo-Fijians from 35 Wellington households, ident-
ified through networking, participated in this study. In addition, six
Indo-Fijians aged 18–20 participated in a pilot study of the self-report
and performance assessment. Of the 35 respondents in the actual
study, more of the younger group (age 15–17) were interviewed than
the older. The female numbers exceeded the male (see Table 1).
About three-quarters of them had immigrated to NZ between four and
six years after the 1987 military coups. The others had lived in NZ
between six and ten years.
In this study, data collection focused on younger Indo-Fijians (aged
15–21) for several reasons. Recent studies on FH use and proficiency
in Fiji (Jan Tent, personal communication, 1995; Siegel, 1987) and
personal observation of young people’s communication patterns in
Fiji indicated that the respondents could be expected to be highly
proficient in FH on their arrival in NZ. The literature also notes the
use of English among the younger population for interethnic com-
munication in Fiji (White, 1971; Geraghty, 1984; Siegel, 1987).
Being recent immigrants, however, they would also be facing host
community peer pressure to become as fluent as the native speakers
of English in NZ. Because there are so few Indo-Fijians at school,
tertiary institutions and in the workforce, there is little or no opport-
unity for FH use. Dorian (1989) refers to these constraints which
reduce the display of language skills as ‘skewed performance’.
Finally, from my own knowledge of this community and as Hakuta
and D’Andrea (1992) state in a similar study among Mexican back-
ground teenagers, at this age the respondents were expected to be
Table 1 Participating respondents by age and gender
Male Female
Age 15–17 Age 18–21 Age 15–17 Age 18–21
Age totals 10 5 10 10
Gender 15 20
Note: n = 35
Nikhat Shameem 89
living at home and were therefore still being influenced by the home
language environment.
III Methodology
In this study the performance test was designed primarily to validate
the self-report scale and to correlate the self-report and performance
data. As in any kind of oral assessment one can at best only make
judgements about the communicative performance of the respondents,
which may or may not be a true reflection of their language abilities
(Bachman, 1990: 37). Therefore a certain discrepancy between the
performance and the self-report was to be expected. Hakuta and
D’Andrea (1992) acknowledged this feature in their Spanish/English
tests with Mexican background high school students in the US when
they wrote that the test performance of their respondents could not
really indicate their true potential.
If a language is to be maintained in an immigrant community, it
is the performance aspect that determines that a language will be
actively maintained into the future. Therefore this research addressed
both concerns: the reported or potential ability which to date has been
the conventional method used to study language shift, and the com-
municative performance of the respondents in order to check the val-
idity of this research method.
As the aim of the performance test was to validate the self-report
proficiency scale and to compare self-perceived proficiency with per-
formance ability in this immigrant community, the self-report and
performance scales were broadly comparable.
IV Self-report proficiency: level descriptions

The respondents first of all self-rated their proficiency on a scale
which encompassed real-life tasks for which native speakers use FH.
(The self-rating scales for aural and oral proficiency are given below.)
It was necessary, therefore, to concentrate on developing questions
dealing with tasks at higher aural and oral FH proficiency. This meant
drawing fine distinctions between levels 4 and 5 on the aural scale
and levels 5 and 6 on the oral one. While at level 4 on the aural scale
respondents felt they could understand the FH speech of Indo-Fijian
interlocutors in their immediate Wellington environment, at level 5
they could understand all cultural and social nuances as well as rural
and older varieties of FH spoken by their grandparents’ generation
and in Fiji. On the oral scale at level 4, the respondent felt able to
converse informally with interlocutors in Wellington about normal
everyday things, while at level 5 they felt able to perform the cogni-
tively more demanding task of summarizing, translating and retelling
in FH a story written in some other language. At level 6, the respon-
dents felt able to discuss a range of topics including the issue of FH
maintenance in NZ, themes in movies they had seen, their future
career paths, etc.
Aural scale
0 No proficiency
1 Basic courtesy requirements: I can understand someone when
they greet me or say thank you.
2 Minimum social proficiency: I can understand simple
questions about my name, family, address, etc.
3 Basic social proficiency: I can understand if someone speaks
slowly to me.
4 Social proficiency: I can understand people when they talk at
a normal speed to each other.
5 Native social proficiency: I can understand everything I hear
(including rural and older varieties spoken in Fiji).
Oral scale
0 No proficiency
1 Basic courtesy requirements: I can greet someone and say
thank you.
2 Initial social proficiency: I can give basic information about
myself and my family.
3 Minimum social proficiency: I can describe my school, my
work, Fiji, NZ.
4 Basic proficiency: I can easily say what I want to in a
conversation.
5 Social proficiency: I can talk about a story that I’ve read.
6 Native social proficiency: I can talk about anything.
V The performance scale

Kalantzis et al. (1989) stress that in order for a test to be reliable
the raters must have clear guidelines in terms of rating criteria and
procedures on which to base their judgements. This allows consistent
judgements across raters and scales (also see Wilds, 1975; Clark and
Lett, 1988).
In this study the development of an appropriate rating procedure
and test descriptors was a complex task because, as Porter (1991: 32)
points out, oral language ability does not fall neatly into natural pre-
existing categories and any attempt to categorize language
performance will meet with varying degrees of success. He makes a
Nikhat Shameem 91
useful suggestion that the best type of rating leaves out a considerable
amount and focuses on the important and salient features.
Moreover, as stated earlier, the Wellington Indo-Fijian community
is largely bilingual and code-mixing is a norm in this community.
The problems faced in scale development were primarily related to
this English–FH relationship, which included the establishment of
acceptable levels of code-mixing among the respondents and the
awareness of the interlocutors during the test of the other’s
bilingualism. It was inevitable that these factors would affect the
language used during the session. This was found to be so in a Dutch–
NZ study where the members of the second generation, in particular,
were influenced by this awareness (Kroef, 1977; also see Poplack,
1980).
To counter the most urgent of these problems, acceptable degrees
of code-mixing at each level were established primarily through my
knowledge of the language and this community both in NZ and in
Fiji. I was aware, for example, of the frequent use of English
conjunctions in FH sentences and research also suggests that nouns
are the most frequently switched category. I therefore expected the
respondents to be making the most frequent FH–English switch in
their use of nouns. However, in a community in which lexical
switching is the norm, this is not strong enough evidence of language
loss. Therefore a vocabulary test was used to check that the word was
known in both languages. The acceptability of a primarily English
sentence or clause as part of FH speech was established by making
a distinction between code-mixing and borrowing in FH. Generally,
borrowing (in which the lexical items taken from English are
phonologically assimilated into FH) was a stronger indicator of FH
performance ability than code-mixing (in which they are not).
The listening and speaking scales had similar categories at the
lower levels 1 and 2, when respondents were merely required to
demonstrate a superficial working knowledge of the language and
some awareness of cultural appropriacy. In accordance with similar
requirements on the self-report scales at the same level, the
respondents demonstrated the ability to understand simple personal
questions about themselves delivered at a slow rate, to identify
immediate family relationships and to respond minimally in FH. At
level 2 of the oral scale the response was appropriate to the tone and
style used by the interviewer/tester and response was closely
modelled on the interviewer’s own contribution to the conversation.
At level 3 respondents were able, with frequent clarification, to
understand a broader conversation on immediately familiar places and
things and to show minimal evidence of being able to speak FH
without the model provided by the interviewer, thereby demonstrating
some ability to function independently in a social situation. At level

4, the respondent was credited with having a ‘basic social
proficiency’ – that is, they were able to understand and, with
prompting, respond to a native speaker in a face-to-face situation
when the native speaker was speaking on familiar topics at a normal
rate. At level 5, a ‘social proficiency’ implied a spontaneous
understanding of simple urban FH speech on a wider range of topics
and, with some prompting, an ability to make a comfortable
contribution to a native-speaker conversation. In constructing the
aural performance scale, a closer differentiation at level 5 of the self-
report scale was needed to accommodate those respondents who were
able to understand a wider range of FH speech, including that of
native speakers living in Fiji. Hence, a level 6 ‘native proficiency’
which implied a spontaneous understanding of, and response to, all
FH speech regardless of content, variety and speed of delivery. For
example, at level 6 of the aural scale respondents demonstrated that
they understood the FH words for the days of the week and double-
digit numbers for which English is more commonly used. They were
also expected to display an understanding of several varieties of FH;
for example, the variety used in rural Fiji and that used by their
grandparents’ generation. They were not, however, required to
produce these features orally since this was not a realistic expectation
of young Wellington Indo-Fijians. Therefore, knowledge of these
items was optional on the oral scale. (See Shameem and Read, 1996,
for detailed performance descriptors.)
VI Reliability and validity in testing FH performance

The purpose of the test was to elicit authentic language performance
and to gather evidence of the performance ability of young Indo-
Fijians (aged 15–21) living in Wellington. The results should give an
accurate description of the ability of these test-takers to communicate
orally and appropriately in FH and theoretically should provide a base
for making inferences about patterns of language shift and loss within
this community.
Test appearance, or what has been referred to as ‘face validity’,
had importance in this test since interviewee motivation was a serious
consideration. It was important that the test should sound right and
look relevant to the stated purpose of determining levels of FH
performance. The communicative nature of the test and the
informality of the elicitation procedure also contributed to test
appearance. Although consistency in elicitation procedures is a
fundamental requirement for test reliability, for a communicative FH
test a lock-step approach was inappropriate, since in a real-life
Nikhat Shameem 93
situation both interlocutors would be able to take turns and the

initiative for the conversation could come from either. This would be
a more genuine type of conversation in a natural context and was
therefore encouraged during the test.
One of the positive aspects of this study was my insider status in
this community and my knowledge of the possible domains in which
FH is used. In this study content validity was established by
identifying the appropriate communicative tasks for which FH might
be used, although by its very nature a test is limiting in terms of the
topics that can be discussed and the artificiality of a test situation,
despite the pains taken to make it as realistic as possible (Weir, 1988).
A major concern in establishing content validity is the extent to which
an oral performance test can provide us with a representative enough
sample of the test-takers’ actual behavioural ability in FH – which is
what the rater has eventually to judge.
Concurrent validity was established by comparing the results of the
performance test with the self-reports.
Despite the importance of the discrete elements of validity,
ultimately the overall validity of the performance test lies in its
usefulness to migrant Indo-Fijian communities in their desire to
maintain their first language (FH).
The validity of a test is invariably affected by its reliability, which
means managing and controlling those features within test design,
implementation and rating that might cause a variation or distortion
of performance scores (Bachman, 1990).
High on the list of factors addressed for test validity was the
development of an authentic communicative test which was
representative of real-life language use. An ‘open’ conversational test
which would ideally emulate this situation, however, would be
unreliable with the wide range of responses that could be expected.
Moreover, allowing for flexibility and the demonstration of
respondent initiative and independence also works against the strict
notions of reliability.
To build reliability into test design, therefore, a structure or
framework was provided to restrict the range of possible responses
at each level. The framework for the test content included a
description of tasks which could be achieved at each level and a
procedure to follow so that the respondents had ample opportunity to
demonstrate performance ability at each level. Even if each interview
did not follow exactly the same path, at least a common framework
provided a range of appropriate topics and a procedure to follow to
ensure some reliability. Language performance with similar tasks
provides a better basis for test reliability.
Reliability of ratings was evaluated by means of both inter-rater
and intra-rater checks. A measure of inter-rater reliability was

obtained by comparing two raters’ assessments of each audio
recording of performance using a check-list. The check-list ensured
that judgements were consistent across raters. Intra-rater reliability
was assessed by a simultaneous and delayed rating of each
performance by the test administrator. This analysis and interpretation
of ratings from a variety of sources provided evidence for the
reliability and validity of the test. Intra-rater reliability particularly
demonstrates the internal consistency of the test – a fundamental
validity concern.
VII The Fiji Hindi performance test

The test had three basic components. The first was an interview
which, apart from a conversation between the interviewer and the
respondent, also included a description of a process and a story-
retelling task. The interview was used to assess both listening and
speaking performance. The second component was a listening
comprehension task: respondents listened to and responded to two
extracts from a play in FH. The third component was a vocabulary
test of 10 FH and 10 English words, which respondents translated
from one language to the other.
While the conversation part of the interview gave an holistic
impression of the aural and oral performance ability of the
respondents, the inclusion of the other tasks contributed to the
reliability of the overall judgements by providing data on a wider
range of tasks than just conversational ability. The conversation was
the most authentic part of the test and therefore the data from the
other parts needed to be interpreted and linked to the results of the
conversation test with caution and only in conjunction with it. The
other parts also provided additional and useful material on the
possible performance levels of the respondents at the higher levels of
the rating scales.
Although the test design provided for the isolation of discrete skills
contributing to successful language performance in the aural and oral
components of the test, the ultimate focus was on general
performance, on holistic ratings and on the appropriacy of discourse
to the situation.
1 Interview: format and design

Because of the preliterate nature of FH, a semi-script using roman
letters was used for writing the FH components of the test. This semi-
script was earlier used by Pillai (1990) to write the script of the only
Nikhat Shameem 95
available piece of FH literature, a play, Adhura Sapna. However, all

instructions regarding the implementation of the test (for interviewer
use) were in English, as FH was felt to be inappropriate for this more
formal purpose.
The structure and content of the interview were influenced by the
main purpose, which was to assess informal FH communicative
performance, rather than proficiency or achievement for more formal
purposes. This placed greater demands on making the test authentic
and appropriate to the respondents, the community and their first
language, FH. Since the home was already established as the main
domain of FH use, the next step was to create test tasks and choose
conventional topics that would reflect this use. A range of topics were
needed, since Shohamy and Reves (1985), for example, found in a
test of Hebrew as a second language that topic change significantly
affected student scores. In this FH test the topics went from the
personal to the relatively unfamiliar. They ranged from the test-takers
speaking about their own lives, to stating their opinions about the
maintenance or loss of FH in NZ. Prompts for the topics were given
by the interviewer, who did this by giving this information about
herself and then inviting a response modelled on her own if needed.
To elicit performance beyond the lower levels, a series of probe
questions within each topic area was used. Clear levels of
performance had to be established to signal whether effective
communication was taking place at each level (see Kalantzis et al.,
1989; Brindley, 1991; Shameem and Read, 1996).
Working within these parameters, drawing from literature on
language testing and using my knowledge as a member of the
Wellington Indo-Fijian community, I paid close attention to the
boundaries of Indo-Fijian culture to make the tasks culturally
appropriate. The adherence of the procedure to cultural norms also
worked to enhance the validity of the test.
One of the difficulties in developing and using an interview for
testing performance is the need to rate both speaking and listening
skills. The rating of listening performance is especially difficult. In
designing a performance test, therefore, the tasks had to be clearly
structured within each level (1–6) of the performance scale so that
the interviewer and the raters were able to gauge both listening and
speaking performance easily and, using a check-list, rate accordingly.
As much as possible, the tasks at each level of the performance scale
corresponded to the tasks at each level of the self-report scale so that a
judgement on the validity of the self-report scale could be made later.
2 Interview: procedure
At the beginning of the interview, respondents participated in a

conversation on their personal lives and on familiar everyday topics
like their family and school – a conversation to which they could be
expected to contribute easily. The tasks ranged from encouraging the
display of simple FH communicative ability when discussing certain
familiar topics, to the display of a more demanding ability when
discussing and sustaining conversation on more unusual topics like
FH maintenance issues in NZ, analysing a Hindi movie, discussing
their future course of study, etc. During the interview, therefore, the
tasks followed a clear sequence from easy to difficult, from familiar
to unfamiliar, from simple to complex. The initial emphasis of the
performance test was on the elicitation of simple words, phrases and
sentences relevant to meeting a person and starting a conversation in
FH with them. This led to greater demands being made on the
respondent’s FH performance ability by eliciting and encouraging
longer stretches of speech.
In addition to naturalistic conversation, the more structured set
tasks of reading in English, summarizing and retelling in FH the story
Arrival (Shameem, 1992), and describing the omelette-making
process (adapted from Cooper, 1979), allowed greater standardization
of test content. These tasks were used only if the respondents’
performance level was felt to be at a level higher than 2, the level at
which they were able to respond in FH to questions about themselves
and their families with little hesitation but in simple phrases and
sentences and perhaps with a discernible NZ accent.
In the diagram of the omelette-making process, the respondents
described each stage, thereby demonstrating their ability to describe
equipment normally used in the Indo-Fijian kitchen, to give
instructions, to follow a process, to use appropriate sequence words
and to be culture-specific, as Indo-Fijians would normally describe
an omelette as a fried egg.
The read and retell task of the short story Arrival (Shameem, 1992)
was a cognitively demanding task since it required the respondent to
read the story in English and then to summarize, translate and narrate
it in FH to the interviewer. In the story, a young Indo-Fijian
immigrant arrives in NZ for the first time. The story tells of her
apprehension and excitement. Arrival dealt with a familiar culture and
experience, had relevant content, was suitable for young Indo-Fijians,
was short and – being written within the first 2000-word level
(Nation, 1984) – was easily comprehensible. The use of this task
therefore was felt to be justified and useful in determining the level
of performance achieved by the respondents.
Nikhat Shameem 97
3 Interview: rating
The conversation topics, together with the functional level that each
one represented, were listed on a check-list which was used by the
interviewer and during rating. The check-list listed both the listening
and speaking topics, and provision was made for the raters to place
a tick alongside each task as it was accomplished by the test-taker.
In addition, although the emphasis was on holistic rating – which
included judging the appropriacy of linguistic and non-linguistic
features – respondents were assigned a performance level
corresponding broadly to the test descriptors, levels (1–6) on the
discrete performance factors, accent, grammar, vocabulary, fluency
and general comprehension. Respondents were assigned a score in
each discrete category, as having no proficiency (level 0), minimal
proficiency (levels 1–2), minimum social proficiency (level 3), basic
social proficiency (level 4), social proficiency (level 5) or native
social proficiency (level 6). The discrete performance scores,
however, were used only to help reach the holistic ratings.
The rating scale was compensatory, which meant the performance
could first be judged on a broad band as ‘good’, ‘fair’ or ‘weak’. Once
the performance was assessed broadly, it became easier to assign a
specific level within the band. Since the raters had never assessed FH
performance previously, it was hoped that this procedure would be
an easy one to follow and use. The respondents were not expected
to be clearly performing at a particular level. Hence, unlike the self-
report scale which had discrete levels in the performance test, a +
sign was used to indicate that the respondent was functioning well
within the level indicated and was able to perform some of the
functions in the next level up. The rating scale for speaking, therefore,
described the possible levels of communicative performance which
could be attained by young FH speakers in Wellington, but made
allowances for those speakers it was difficult to place definitively
within a prescribed level. On the basis of their performances in the
interviews, a judgement was reached on the respondents’ functional
FH ability.
4 Listening comprehension test: format and design

It was not satisfactory to assess the listening ability demonstrated
purely during the interview, because of the expected differences in
performance levels between listening and speaking and the difficulty
of assessing listening solely through a face-to-face interview. A
specific listening comprehension test (LCT) therefore provided
further evidence of comprehension level. The LCT elicited responses
to two extracts from Adhura Sapna, an FH play set during the late
1970s in Sigatoka Valley, Fiji (Pillai, 1990). The questions tested
for comprehension at several levels. More difficult questions required
respondents to make inferences from what the characters said, while
easier ones elicited the recall of clearly stated information.
The first extract of the LCT was a dialogue between Minla and her
husband Sambhu, an Indo-Fijian farmer who is much older than she.
Extract two introduced Mausi, an aged neighbour. The extracts were
ideal for listening comprehension purposes, as they used authentic
language and were firmly based in Indo-Fijian culture and
background. They also demanded an understanding of several FH
varieties – urban, rural and that spoken by an older generation of
Indo-Fijians – and of FH idioms, humour, nuances and cultural
references.
5 Listening comprehension test: procedure

The respondents listened twice to a tape recording of the two play
extracts. Each extract was divided into further smaller sections.
Extract one had four sections and extract two had two. The test-takers
listened to the whole of extract one first to give them a chance to
understand the context of the dialogue before being given the
questions to section one. They then listened to the part of the
recording relevant to answering the section-one questions. This
procedure was followed for both extracts (six sections). All questions
were administered orally and the answers were recorded as basic
(recall and simple deduction) or good (extended answer and personal
response) by the interviewer, who also noted the language in which
the response was made. Noting the language of response made it
possible to identify those respondents who might understand FH even
if they were unable to respond appropriately in it.
6 Listening comprehension test: rating

Since the rating of receptive skills is open to subjective interpretation,
20 LCT questions gave more objective evidence of aural skills. To
determine aural performance levels, three types of questions were
asked. These were classified according to the nature of answers that
could be given:
Trivial Numerical (one-digit) answers, superficial details and
names. Answers required recall of clearly stated
information in the passage and were either right or wrong.
Local More detailed answers, single words with context support,
Nikhat Shameem 99
paraphrasing required, and recognition and recall of

broader factual information.
Global Synthesis of information, drawing conclusions, focus on
cause-and-effect relationships and inferences.
(Adapted from Shohamy and Inbar, 1991)
In addition to the 20 numbered questions in the LCT which could
be answered by recall or simple deduction, a range of more extended
or alternative responses were possible for each question at different
levels. Respondents were in fact encouraged to volunteer as much
information on each question as possible. With every possible answer
to each question covered, a maximum of 36 answers to the 20
questions could be expected. A maximum score of 36 therefore
indicated that a respondent had demonstrated a certain breadth of
linguistic and cultural knowledge.
7 Vocabulary test: format and design

Knowledge of single lexical items, for which six pilot subjects had
consistently used English during the interview, was assessed through
a vocabulary test. This test determined whether the respondent knew
the FH term for the item even if they had not used it in context. Thus,
although the words were contextualized during the interview, the
vocabulary test determined the ability of the respondents to
understand and translate these specific words from one language to
the other.
The 20 vocabulary items were taken either from the short story
Arrival or from the process diagram. Those from the short story
related mainly to the experience of travelling and the first impressions
of immigration. The items from the process diagram included the
names of things found in a kitchen, i.e., utensils and food, and were
therefore relevant to the use of FH in the Indo-Fijian home. The
respondents were not all expected to have the same degree of
knowledge of these items and some of them might have already used
the correct FH term to identify them during the course of the
interview.
8 Vocabulary test: procedure

To determine their knowledge of key lexical items in the test,
respondents were asked to translate 10 FH words to English
(receptive knowledge) and then to translate 10 English words to FH
(productive knowledge). Answers were recorded on a score sheet
designed for this purpose.
Vocabulary test: rating

The vocabulary test was scored out of 20, and each section was
marked out of 10. The number of attempts with each item was also
noted, in order to give some indication of spontaneity of response,
particularly among those respondents at higher aural and oral levels.
Although the listening comprehension and the vocabulary tests
were found to be useful in gauging the overall performance level of
the respondents, both raters found that the interview itself was a far
stronger indicator of proficiency than either of the other two measures,
given the informal nature and use of FH.
VIII Language performance results

The raters of the test were Wellington Indo-Fijian residents who had
immigrated to NZ following the 1987 Fiji coups. Since the
proficiency at the top of the performance scale was defined as that
of a 20–40-year-old migrant FH speaker in NZ, both raters were able
to use their own proficiency as a yardstick for their ratings of
performance.
Each interview was rated three times, twice by the interviewer
(Rater A) and once by an independent rater (Rater B). Each interview
was rated simultaneously by the interviewer towards the end of, or
immediately following, the session and then again after all the
interviews had been completed by listening to the recordings. The
independent rater (Rater B) assigned levels on the basis of the
recordings.
The frequencies of the three sets of ratings and the means of each
rater’s scores showed some consistency between the simultaneous and
the two delayed ratings (see Table 2). Of the three ratings,
respondents had the highest scores in the simultaneous ratings. In fact
the means indicate that the delayed ratings of Rater B (the
independent rater) generally fell between the simultaneous and
delayed ratings of Rater A. The simultaneous rating may have
reflected the influence of several factors: cultural and social variables,
the use of non-verbal strategies, register, style or rhetorical skill, and
comprehensibility over accuracy in the face-to-face interaction. The
rating was also undoubtedly affected by the respondents’ fluency and
their communicative ability. This was particularly so during bilingual
conversations on wider topics such as language maintenance in NZ.
During the delayed ratings the raters could pay greater attention to
the specific amount of FH that the respondent was using. There was
more time for this, and reruns of the tape were also possible.
Correlation analyses among all three sets of ratings were run using
Table 2 Inter-rater and intra-rater measures of language performance
Level Aural Oral

max. = 6 max. = 6
Simultaneous rating Delayed rating A Delayed rating B Simultaneous rating Delayed rating A Delayed rating B
A A
Mean 5.11 4.83 4.94 4.43 3.97 4.00

SD 0.96 0.98 0.83 1.40 1.56 1.28
Notes:
rating A = interviewer
rating B = independent rater
n = 35
Nikhat Shameem
101
an 11-point scale to accommodate those test-takers who were rated

as falling between specified levels. The necessity for the 11-point
scale illustrates the difficulty of rating language performance,
especially when respondents do not quite meet the requirements of a
higher level yet are clearly fluent at a level below. It may also
illustrate a problem with the scale definitions of the test, meaning that
further work is needed on the descriptors used to define each level.
However, it is difficult to divide any continuum into discrete levels.
Despite the differences in scores between the simultaneous and
delayed ratings, the figures in Table 3 show the high positive
correlations between them, attesting to the reliability of the ratings.
As stated earlier, the simultaneous ratings were higher than either of
the two delayed ratings. Because of the greater consistency of scores
between the two delayed ratings and the high correlation between
them, and to ensure consistency in the results, the means of the
delayed ratings of both raters were used for all analyses.
IX Self-report and language performance results

In accordance with the results from the self-report study (see Table
4), the majority of respondents rated very highly on their FH
performances. At the higher levels of the aural scale, seven of the 35
respondents were classified as native-like FH users, interpreted at
level 6 as ‘able to understand all FH speech at normal to fast
utterance rates on a broad range of appropriate topics, including FH
maintenance in NZ’. Nineteen respondents (the majority) were at
level 5 or 5+, thus having social proficiency but being unable to cope
with the full range of native-speaker conversational FH.
On the oral scale, by comparison, only three respondents were
Table 3 Correlation analysis* of simultaneous and delayed ratings of language

performance
Rater A (simultaneous) Rater B (delayed)
Aural
Rater A (delayed) 0.920 0.909
Oral
Rater A (delayed) 0.941 0.942
Aural
Rater A (simultaneous) 0.899
Oral
Rater A (simultaneous) 0.947
Notes:
n = 35
*Spearman correlation coefficient
Nikhat Shameem 103
Table 4 Fiji Hindi self-report proficiency and performance levels
Level Self-report Delayed performance Simultaneous

(averaged) performance
Aural max. = 5
3 1 2 3
4 5 7 5
5 29 26* 27*
Means 4.80 4.69 4.69
SD 0.47 0.58 0.63
Oral max. = 6
0 – – –
1 1 1 1
2 4 6 4
3 4 4 4
4 1 5 3
5 1 16 16
6 24 3 7
Means 4.97 4.09 4.43
SD 1.65 1.38 1.40
Notes:
n = 35
*Aural performance results at level 5:
Delayed
19 respondents were rated at level 5
Simultaneous
placed at Level 6 and 16 at level 5. Therefore, as expected, aural

performance was more advanced than oral, with even the respondents
with the lowest productive capacity at level 1 judged to have higher
than level 2 receptive skills (see Table 4). The fact that only three
respondents were at level 6 of the oral scale suggests that the top of
the scale – defined in terms of the proficiency of a young adult (age
20–40) Indo-Fijian immigrant – may have been either too high or
inappropriate for describing the language proficiency of younger
Indo-Fijians (age 15–21) currently living in Wellington. It may also,
of course, indicate that assessing actual performance was a better
indication of language proficiency than self-report data. The majority
of respondents in this community were shown to be either at oral
level 5 (social proficiency) or between levels 5 and 6 (native social
proficiency) in their use of FH.
Self-report and performance results are compared in Tables 4 and
5. Table 4 gives a descriptive profile of the number of respondents
at each level of the two scales. A comparison of means shows the
Table 5 Self-report and performance ability: validity and reliability
Fiji Hindi Performance

Delayed rating Simultaneous rating
Correlations1
Self-report Aural Oral Aural Oral
0.635 0.670 0.703 0.692
Significance2
sgn rank = 7 sgn rank = 173* sgn rank = 9 sgn rank = 89*
p-value = p-value = p-value = p-value =
0.2891 0.0001 0.2188 0.0020
Notes:
* These values were significant at p ⬍ 0.05
1
Correlations: Spearman correlation coefficient
2
Significance: Wilcoxon matched-pairs signed-ranks test
trends in the data. The statistical comparison between the self-report

and performance results in Table 5 shows the strength of the
relationship between them and demonstrates individual differences in
reported proficiency and actual performance. Because the self-report
scales had discrete categories, the + sign was removed from all
performance ratings during data comparisons between the two
measures.
To enable a meaningful comparison of data, the descriptors at each
level of the performance scale matched the tasks on the self-report
scale. In addition, level 5 was designated to be the top level of both
the report and performance aural scales, since the top of the self-
report scale was already established at level 5. Subsequently the
scores of 5 and 6 on the language performance results were included
in a single level 5 to enable comparison with the self-report data.
This collapse of two levels of scores into a single category may have
affected the reliability estimate for the aural scales.
On the other hand, the oral scale of the performance test
corresponded directly with the self-report categories so that the
comparison of spoken FH reports and performance was made with a
greater degree of confidence.
The difference in the reported proficiency of the respondents was
marginally higher than their actual performance level, a difference
which was more apparent with the oral result than with the aural one.
Despite the differences in means, the performance ratings showed a
strong correlation with the self-report, although (as Table 5 shows)
this was slightly more so for the simultaneous rating than for the
delayed one. These correlation figures provide evidence for the
validity of the self-report scale since the respondents who were
reporting their proficiency at relatively high levels were in fact also
likely to be rated highly on the performance scale.
Nikhat Shameem 105
The self-report scale was validated by the matched-pairs test of

statistical significance in which the two sets of data, self-report and
performance, were matched for each individual and a test run to
determine the statistical significance of the difference between these
two results (Table 5). The delayed ratings of the two raters almost
consistently placed aural performance at a level equal to that reported
by the respondent. The aural performance ratings at level 5 had the
most consistency with the self-reports at the same level. It must be
remembered, however, that the aural self-report scale was merely a
0–5 point scale, unlike the 0–6 point performance one, and that it
was inclusive of levels 5, 5+ and 6 of the aural performance ratings.
On the other hand, the delayed rating of the oral performances was
consistently lower than reported ability. While the differences in the
aural reports and performance levels were not statistically significant,
the differences between the oral ones were highly so. This could mean
that the respondents had either significantly overestimated their FH
proficiency at level 6 of the oral self-report scale or were unable to
differentiate between the demands at the higher two levels of this
scale. It must be noted that few respondents would have read a story
in English and retold it in FH in real life (Level 5) or been placed
in a situation where they had to discuss and debate more formal topics
in FH (Level 6). The inclusion of these descriptors might have
influenced the high number of level 6 self-assessments. Over two-
thirds of the respondents reported that they were able to perform at
level 6, but only three respondents were placed at level 6 by the
delayed rating on the performance measure. The majority of
respondents who had reported their oral proficiency as level 6 were
actually judged to be performing at level 5 or 5+.
Since a majority of the respondents placed themselves at higher
FH proficiency levels, a greater differentiation at level 5 of the aural
self-report scale would have been extremely useful for these analyses.
Indeed, a similar trend to the oral results may have been observed.
X Summary: language performance

Thirty-five respondents participated in this survey, which investigated
whether the younger members of the Wellington Indo-Fijian
community were maintaining FH, their first language.
The survey methodology included the development and
implementation of a self-report questionnaire and a communicative
FH test. A majority of language maintenance studies in migrant
communities rely on self-report data. The self-report scale in this
study was validated by the correlation analyses with performance test
results. The results showed a stronger correlation between the aural
scales than between the oral ones; however, this may have been
because there were five categories used in the aural scale as against
the six in the oral one. On the oral scale the clearest differences
between self-report and performance results were at the higher levels,
where the majority of respondents who reported being at top oral
proficiency level 6 were judged to be performing at the lower levels
of 5 or 5+. These results indicate that studies of language maintenance
or loss using self-report scales as a measure of proficiency do give a
fairly accurate picture of language performance, although this may
be more so for aural than for oral performance.
In addition to validating the self-report scale, the study also
confirmed the reliability of the language performance scale when the
simultaneous ratings were compared to the two delayed ratings. The
simultaneous ratings were generally – although not significantly –
higher than the delayed ones, and this was attributed to the effect
of non-verbal communication and other cultural and sociolinguistic
factors. Most respondents were rated level 5 or 5+ on a scale on
which level 6 was defined as the native proficiency of an adult Indo-
Fijian immigrant (age 20–40) living in Wellington. This data provides
some evidence of language loss having taken place in this group of
Wellington Indo-Fijians within the first generation, although it is
difficult to determine whether the process was initiated in Fiji or NZ.
XI References
Bachman, L.F. 1990: Fundamental considerations in language testing.
Oxford: Oxford University Press.
Barz, R.K. and Siegel, J., editors, 1988: Language transplanted: the
development of overseas Hindi. Wiesbaden: Otto Harrassowitz.
Brindley, G. 1991: Defining language ability: the criteria for criteria. In
Anivan, S., editor, Current developments in language testing,
Anthology series 25, Singapore: SEAMEO Regional Language Center.
Clark, J.L.D. and Lett, J. 1988: A research agenda. In Lowe Jr, P. and
Stansfield, C.W., editors, Second language proficiency assessment,
New Jersey: Prentice Hall Regents.
Cooper, J. 1979: Think and link: an advanced course in reading and writing
skills. London: Edward Arnold.
Dorian, N.C., editor, 1989: Investigating obsolescence: studies in language
contraction and death. New York: Cambridge University Press.
Geraghty, P. 1984: Language policy in Fiji and Rotuma. In Duivosavosa:
Fiji’s Languages: their Use and their Future, Suva: Bulletin of the
Fiji Museum.
Gillion, K.L. 1962: Fiji’s Indian migrants: a history to the end of indenture
in 1920. Melbourne: Oxford University Press.
Hakuta, K. and D’Andrea, D. 1992: Some properties of bilingual
Nikhat Shameem 107
maintenance and loss in Mexican background high-school students.

Applied Linguistics 13 (1), 72–99.
Holmes, J., Roberts, M., Verivaki, M. and ′Aipolo, ′A. 1993: Language
maintenance and shift in three New Zealand speech communities.
Applied Linguistics 14 (1), 1–24.
Kalantzis, M., Cope, B. and Slade, D. 1989: Minority languages and
dominant culture. London: Falmer Press.
Kloss, H. 1968: Notes concerning a language nation typology. In Fishman,
J.A., editor, Readings in the sociology of language, The Hague:
Mouton.
Kroef, A.P.M. 1977: The use of language in a three generational group of
Dutch immigrants in New Zealand. MA thesis, Auckland: University
of Auckland.
Martin-Jones, M. 1991: Sociolinguistic surveys as a source of evidence in
the study of bilingualism: a critical assessment of survey work
conducted among linguistic minorities in three British cities.
International Journal of the Sociology of Language 90, 37–55.
Nation, I.S.P. 1984: Vocabulary lists: words, affixes and stems, occasional
publication no. 12, Wellington: Victoria University of Wellington.
Nicholas, J. 1988: British language diversity surveys (1977–87): a critical
examination. Language and Education 2, 15–33.
Pillai, R.C. 1990: Adhura Sapna. Unpublished Fiji Hindi play.
Poplack, S. 1980: Sometimes I’ll start a sentence in English Y TERMINO
EN ESPANOL: toward a typology of code-switching. Linguistics 18,
581–618.
Porter, D. 1991: Affective factors in language testing. In Alderson, J.C. and
North, B., editors, Language testing in the 1990s: the communicative
legacy, London: Macmillan.
Shameem, N. 1992: Arrival and other stories. Wellington: Multicultural
Educational and Resource Center.
—— 1994: The Wellington Indo-Fijians: language shift among teenage new
immigrants. Journal of Multilingual and Multicultural Development
15 (5), 399–418.
—— 1995: Hamai log ke boli: Language shift in the Wellington Indo-Fijian
community. PhD thesis, Wellington: Victoria University.
Shameem, N. and Read, J. 1996: Administering a performance test in Fiji
Hindi. Australian Review of Applied Linguistics Series S, 13, 80–104.
Shohamy, E. and Inbar, O. 1991: Validation of listening comprehension
tests: the effect of text and question type. Language Testing 8 (1),
23–40.
Shohamy, E. and Reves, T. 1985: Authentic language tests: where from
and where to? Language Testing 1 (2), 48–59.
Siegel, J. 1973: A survey of language use in the Indian speech community
in Fiji. Unpublished field study project for the Culture Learning
Institute and practicum project for ESL 730. Honolulu.
—— 1987: Language contact in a plantation environment: a sociolinguistic
history of Fiji. Cambridge: Cambridge University Press.
Tent, J. 1995: personal communication.
Weir, C.J. 1988: Communicative language testing. New Jersey: Prentice

Hall.
White, R.V. 1971: Language use in a South Pacific urban community.
Anthropological Linguistics 13, 37–42.
Wilds, C.P. 1975: The oral interview test. In Jones, R.L. and Spolsky, B.,
editors, Testing language proficiency, Arlington Center for Applied
Linguistics.

Validating Self Assessment

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validating Self Assessment

Uploaded by

Copyright:

Available Formats

Language Testing

Validating self-reported language proficiency by testing performance in

The online version of this article can be found at:

Email Alerts: http://ltj.sagepub.com/cgi/alerts

Language Testing 1998 15 (1) 86–108 0265-5322(98)LT144OA  1998 Arnold

in Fiji during indenture 1879–1916, when 60 965 Indians were taken

Table 1 Participating respondents by age and gender

IV Self-report proficiency: level descriptions

V The performance scale

some ability to function independently in a social situation. At level

VI Reliability and validity in testing FH performance

situation both interlocutors would be able to take turns and the

and intra-rater checks. A measure of inter-rater reliability was

VII The Fiji Hindi performance test

1 Interview: format and design

available piece of FH literature, a play, Adhura Sapna. However, all

At the beginning of the interview, respondents participated in a

4 Listening comprehension test: format and design

5 Listening comprehension test: procedure

6 Listening comprehension test: rating

paraphrasing required, and recognition and recall of

7 Vocabulary test: format and design

8 Vocabulary test: procedure

Vocabulary test: rating

VIII Language performance results

Level Aural Oral

Mean 5.11 4.83 4.94 4.43 3.97 4.00

an 11-point scale to accommodate those test-takers who were rated

IX Self-report and language performance results

Table 3 Correlation analysis* of simultaneous and delayed ratings of language

Rater A (simultaneous) Rater B (delayed)

Level Self-report Delayed performance Simultaneous

placed at Level 6 and 16 at level 5. Therefore, as expected, aural

Fiji Hindi Performance

trends in the data. The statistical comparison between the self-report

The self-report scale was validated by the matched-pairs test of

X Summary: language performance

maintenance and loss in Mexican background high-school students.

Weir, C.J. 1988: Communicative language testing. New Jersey: Prentice

You might also like