You are on page 1of 20

The role of topical knowledge in construct definitions

In Chapter 4 we argued that the topical knowledge of language users is always involved in

language use. It follows that if language test tasks are authentic and interactional, and elicit

instances of language use, test takers topical knowledge will always be a factor in their test

performance. Historically, language testers have viewed topical knowledge almost

exclusively as a potential source of test bias, or invalidity, and the traditional practice in

developing language tests is to design test tasks that will minimize, or control, the effect of

test takers topical knowledge on test performance. We take a slightly different view and

would argue that this is not appropriate for all situations. Although topical knowledge may, in

many situations, be a potential source of bias, there are other situations in which it may, in

fact, be part of the construct the test developer wants to measure. One question that needs to

be asked, then, is when does the test developer consider topical knowledge a potential

source of bias and when does he define it as part of the construct? thin question is addressed

in section 1 in this chapter, under making inferences . it is in such situations that the test

developer is most likely to define topical knowledge in the construct and what the possible

consequences are of defining it in a particular way.

We believe that there are essentially three options for defining the con struct to be measured,

with respect to topical knowledge:

1. define the construct solely in terms of language ability, excluding topical knowledge

from the construct definition,

2. include both topical knowledge and language ability in the construct definition, or

3. define topical knowledge and language ability as separate constructs.

Here we will discuss these three options for defining the construct, along with typical

situations, intended inferences, potential problems, and possible solutions for each. We will

list the considerations associated with each option in more or less outline form, and then

provide an example for each option.

Option 1: Topical knowledge not included in the construct definition

Typical situations

Where test takers are expected to have relatively widely varied topical knowledge:

- language programs, where inferences about language ability are to be used to make

decisions about individuals (e.g. selection, diagnosis, achievement)

- academic, professional, or vocational training programs, or for employ ment, where

inferences about language ability will be one of several fact ors considered in the

selection process

- research in which language ability is included as a construct.


It is important to remember that even if the test developer has decided not to include topical

knowledge in the construct definition, it may still be involved in test takers performanceb

For a particular testing situation, the test developer may simply choose to focus on language

ability and not attempt to make inferences about topical knowledge.

Intended inference

Components of language ability only

Potential problem I

Possible bias due to specific topical information in the task input; test takers who happen to

have relevant topical knowledge may be favored,

Possible solutions
1. Include in the task input topical information that we expect none of the test takers to

know. This is a widely used solution in language tests. Potential problem: this

decontextualizes the test task, and thus probably biases the test against everyone to

some extent.

2. Include in the task input topical information that we expect will be familiar to all of

the test takers. Potential advantage: this contextualizes the task for everyone, and thus

should facilitate optimum performance. Potential problem, particularly with tasks

aimed at assessing compre hension: test takers may be able to answer the questions

largely on the basis of topical knowledge.

3. Present test takers with several tasks, each with different topical cont ent. There are

two ways in which this solution might be implemented:

a. have all test takers complete all tasks;

b. give test takers a choice.

We favor providing test takers with choices, since this is one way in which we can

accommodate their specific interests. In addition, we feel that this gives them a greater

sense of involvement in the test taking process.

Potential problem 2

Possible ambiguous interpretation of low scores, and hence problem of questionable construct

validity. Specifically, if test takers get low scores, this could be due either to low language

ability or to low topical knowl edge, or both.

Possible solutions

1. In test tasks that engage test takers in listening or reading activities, and that elicit

selected or limited production responses, clearly specify the component(s) of

language ability that each test task is designed to measure, and then design test tasks

with this in mind.

2. In test tasks that engage test takers in speaking or writing activities that elicit extended

production responses, use analytic rating scales for rating components of language

ability, or conduct qualitative analyses of the test takers responses. The use of

analytic rating scales enables us to facus on specific componens of language ability.

Example 1

Suppose we wanted to measure the ability of international students applyi ng to an American

university to read academic material in English for purposes of selecting them for

admission. Bearing in mmd the variety of disciplines in any given university, we might

decide to develop a number of reading passages with different topical information and allow

test takers to choose several to which to respond. We would need to design types of test tasks
that focus on areas of language ability, and then write tasks for each of the different reading


We would most likely be interested in the students ability to control certain forms of written

discourse and not the extent of their knowledge of the topic about which they happened to be

reading. We would therefore not include topical knowledge in the construct definition, and

would rate answers on the basis of components of language ability. (See Project 5 in Section

Three, in which different test tasks are designed to measure differ ent components of

language ability.)

Example 2

Suppose we wanted to develop an achievement test for students of the Thai language to

assess their ability to use the appropriate register with interIoc utors of different relationships,

ages, and social status in faceto-face oral interactions. We would most likely be interested in

measuring their know ledge of the apprpriate personal pronouns and politeness markers, and

not the extent of their knowledge of any particular information that might be the topic of the

interaction. We would therefore not include topical knowledge in the construct definition. We

might develop a set of role plays that dealt only with topics with which we expected the

students to be famil- jar. We might include role plays with a hypothetical superior, an office

associate, a close friend, a hypothetical subordinate, and a young child. Test takers responses

to the ro1eplay prompts and questions would be rated only in terms of the appropriateness of

the register markers they used.

Option 2: Topical knowledge included in the construct definition

Typical situation

Where test takers are expected to have relatively homogeneous topical knowledge:

- language for specific purposes programs, where the language is being learned in

conjunction with topical information related to specific aca demic disciplines,

professions, or vocations, and where inferences from test scores are to be used to

make decisions about mdividuals (for example, selection, diagnosis, achievement)

- selection for professional or vocational training programs, or for employ ment, where

scores from the language test will be the major factor cons idered in the selection

process. This involves using these inferences to make predictions about test takers

capability to perform future tasks or jobs that require the use of language.

Intended inference

Ability to process (interpret or express) specific topical information through language.

Potential problem I
The test developer or user may mistakenly fail to attribute performance on test tasks to

topical knowledge as well as to language ability. This problem of inference may be due to

lack of clarity in the specification of test tasks and in the scoring of responses.

Potential problem 2

Test scores cannot provide specific feedback to the test user or to test takers, for diagnostic


Possible solution

Clear specification of construct, definition for each test task and of criteria for scoring test

takers responses.

Example 1

Suppose we wanted to assess diplomats in the foreign service on their abili ty to read and

understand descriptions of political activities associated with elections in the foreign country

to which they are likely to be posted. We might design a set of tasks in which test takers are

presented with a number of passages representing the types of discourse normally found in

newspap er accounts. We might develop for each passage a set of multiple-choice items

intended to measuie the test takers comprehension.

Example 2

Suppose we wanted to measure the ability of an international teaching assistant to give a

short classroom lecture on the subject matter that she might be expected to teach, such as

educational evaluation. We might design a test task that simulates a lecture, and define the

construct as ability to give a we1Iorganizecl lecture on the topic of validation. (See the

discus sion of rating scales in Chapter 11.)


In Section 1 of this chapter, we discussed some of the considerations involved In attempting

to make inferences and predictions about indi viduals capability to perform tasks or jobs that

require the use of language, and pointed out the demands this places on the test developer,

who must clearly delineate the roles of individual characteristics, including topical

knowledge, in the construct definition. In Chapter 11 we point out some of the problems

involved in attempting to rate samples of spoken or written language with holistic rating

scales. Because of these problems, as well as those of potential lack of clarity in test task

specifications and lack of detailed feedback, we believe that the next option is preferable in

most situations in which the test user wants to make inferences about both topi cal knowledge

and language ability.

Option 3: Language ability and topical knowledge defined as separate constructs

Typical situation

Where the test developer may not know whether test takers have relatively homogeneous or

relatively widely varied topical knowledge, and wants to measure both language ability and

topical knowledge. (This is very similar to Option 2, with the differences that (1) the test

developer has no expectat ions about test takers topical knowledge and (2) the test user wants

to be able to make inferences about both language ability and topical knowledge.):

- language for specific purposes programs, where the language is being learned in

conjunction with topical knowledge related to specific acad emic disciplines,

professions, or vocations

- selection for vocational training programs, or for employment, where scores from the

language test will be the major factor considered in the selection process. These

inferences will be used to make predictions about test takers capability to perform

future tasks or jobs that require the use of language

- research in which language ability and topical knowledge are included as constructs

Intended inferences

Components of language ability and areas of topical knowledge

Potential problem

Less practical than Option 2 because it requires the two separate sets of test tasks, with

separate scoring ate sets of rating scales, focusing on language knowledge.

Example I

Suppose we wanted to make inferences about elementary school students ability to express,

in writing, their knowledge of how to use a dictionary. We could present them with a writing

prompt that requests them to explain to another student how to find the meaning of a word in

the dictionary. We could use separate rating scales to score their language ability and topical

knowledge. Thus, we could rate the composition in terms of areas of lang uage ability, such

as grammatical accuracy and knowledge/control of feat ures of legible handwriting. In

addition, we could rate it in terms of acc uracy of content. This would potentially enable us to

make separate inferences about students control of areas of language ability in writing and of

topical knowledge.

Example 2

Suppose we wanted to develop a screening/placement test in English and mathematics for

immigrant children entering an elementary school. For part of this test, we may want to

measure their ability to use English to understand written instructions, and include in the

construct knowledge of the vocabulary of the number system, dates, times, and scheduling.
In addition, we may want to measure their knowledge of and ability to perf orm the basic

mathematical operations of addition, subtraction, multi plication, and division. Moreover, if

we did not know in advance whether the children had topical knowledge of these

mathematical operations, we could develop two sets of test tasks that involve reading, each

with separate scoring criteria.

The first set of tasks, focusing on language ability, could include Written instructions

about when to report for different activities. Test takers could be required to give short written

answers, consisting essentially of numbers, to simple questions about times for activities.

Scoring criteria would be whether or not the test takers provided the factually correct times.

The second set of tasks, focusing on topical knowledge, could include solving simple

pencilandpaper arithmetic problems involving addition, subtract ion, multiplication, and

division. Scoring criteria would be based upon whether or not the children could calculate the

correct answers to the problems.


In situations where the test developer knows very little about the test takers areas or levels of

topical knowledge this is probably the most justifiable approach in terms of validity of

inferences. However, while this may be so, it makes the greatest demands on resources and

may not always be feasible.

Dealing with the problem of topical knowledge in language tests

How to deal with the potential effects of differing levels of topical knowl edge on language

test scores i a problem fundamental to all language tests. There are no easy solutions, and

there is certainly no universal solution for all testing situations. The particular solution that

the test developer arrives at will be a function of the various factors discussed above. What is

clear is that the test developer cannot simply assume, either explicitly or by default, that

topical knowledge need not be addressed simply because the focus of the test is on language

ability. It is equally clear that in all situa tions the language test developer needs to obtain as

much information as possible about potential test takers areas and levels of topical

knowledge, and should consult with content specialists in determining the areas of topi cal

content to include in the test and the accuracy of the information that is included.

Furthermore, in situations where the test developer/user wants to make inferences about test

takers language ability and areas of topical knowledge, it is crucial for the test development

team to include specialists in both language and the content area(s) to be assessed

Role of language skills in construct definitions

In Chapter 4 of this book language skills (listening, included in the construct among the

language skills

we have taken the position that the familiar reading, speaking, and writing) should not

be definition. This is because, if we distinguish only in terms of mode (productive or

and channel (audio or visual), we end up with skill definitions that miss many of the other

important distinctions between language used in particul ar tasks For example, suppose we

have two tasks. In one, test takers prepp are a speech in which they compare and contrast the

positions taken by two authors in two different stories. In the second task, test takers write a

two-line note to the mail carrier. Nominally, the first task is a speaking task, while the second

is a writing task. Yet these two tasks differ in many other ways as well. In the first task, the

length of the response is long, the language of the input is long and complex, the language of

the expected response is highly organized rhetorically and in a formal register, and the scope

of relationship between input and response is broad. In the second task, the length of the

response is short, there is no input in language form, the language of -the response is not

highly organized and is in informal register, and the scope of relationship between input and

response is narrow. Thus, focusing only on the difference between the so-called skills

(speaking these two alone are tasks. versus writing) misses much that goes into distinguishing

between language use tasks. That is, it suggests that the names of skills sufficient to define

the critical characteristics of language use

Another reason for not including a specific skill in the construct defini tion is that, as

Widdowson (1978) has pointed out, many language use tasks involve more than one skill.

What he calls the communicative ability of conversation involves listening and speaking,

while what he calls cor respondence mvolves reading and writing. Therefore, rather than

defirnng a construct in terms of skills we suggest that the construct definition include only

the relevant components of language ability, and that the skill elements be specified as

characteristics of the tasks in which language abili ty is demonstrated.

Exercises for Section 4

1. Read Project I in Part Three. How might the construct definition be changed if we

wanted to measure specific components of strategic com petence? What additional

scoring procedures might we need to implement?

2. Obtain a copy of a published language test and test manual. Read what ever mateal is

available dealing with the construct definition. How is the construct defined? What

kind of Construct definition is supplied: syllabus-based or theory-based? Does the

construct definition involve language ability only, or is topical knowledge included in

the construct definition? Does the construct definition include the metacognitive strat

egies? How adequate do you find the Construct definition to be? How might you

revise it

3. Recall the last language test you developed or used. How did you define the construct

to be measured at the time? How might you define it now? if You did not define the

construct to be measured, how did that affect the usefulness of the test?

4. Has the discussion of defining the construct to be measured helped you distinguish

components of language ability from characteristics of test tasks used to measure

language ability? How? Do you find this distinct ion useful? Why or why not? What

would be the consequences of defining the construct to be measured in such a way as

to include charm acteristics of the TLU setting (such as ability to take an order in a


5. Are there any testing situatiOns in wiich you feel it would be justified not to define

the construct to be measured? Why or why not? If we want to predict the ability of a

test taker to perform a specific TLU task and we know exactly what that task is, is it

still necessary to define the construct to be measured?

Suggestes Reading for Secstion 4

See the suggested readings for chapter 4

Chapter summary

The process of test development begins with the preparation of a set of descriptions and

definitions. We begin by describing the test purpose, which in the case of a language test is

primarily to make inferences about language ability. Additional purposes may include using

these inferences to help us make decisions affecting test takers, teachers or supervisors, or

programs. In addition, tests may also be used for research purposes.

Next we describe the TLU situation and tasks so that we can design test tasks that

correspond in demonstrable ways to specific target language use situations and tasks,

enabling us to develop authentic test tasks. We start by identifying the TLU situation and

tasks We then describe these in terms of their distinctive characteristics. If we are designing a

test for use in a language instructional setting, we need to look beyond the instructional tasks

to identify the TLU situation and tasks.

Next we describe the characteristics of the language users/test takers, which allows us

to develop test tasks that will be appropriate to the test takers. The characteristics of test
takers that we believe are particularly relevant to test development can be divided into four


1. personal characteristics,

2. topical knowledge,

3. general level and profile of language ability, and

4. predictions about test takers potential affective responses to the test.

Preparing this description will be relatively have in mind a specific TLU situation and wise, it

will be more difficult.

Finally, we need to define the construct to be measured so that we can know how to interpret

test scores. During the design stage, we define the construct, language ability, abstractly, and

we can base this definition on either the content of a language syllabus or a theory of

language ability. Although strategic competence will be involved in test performance, the test

developer may or may not want to make specific inferences about this component of

language ability, and thus must decide whether or not to include it explicitly in the definition

of the construct. Test takers topical knowledge will also be mvolved in test performance, and

thus the test developer must also decide on the nature of the inferences to be made, and define

the construct accordingly. There are essentially three options for this:
1. do not include topical knowledge in the construct definition, and make inferences

about language ability only,

2. include topical knowledge in the construct definition, and make infer ences about the

ability to interpret or express specific topical informa tion through language, and

3. define language ability and topical knowledge as separate constructs, and make

separate inferences about these two constructs.

Typical situations in which these three approaches might be appropriate are presented, along

with potential problems and solutions.

To conclude, we reiterate and reinforce our contention, presented in Chapter 4, that

language skills are not part of language ability, but consist of specific activities or tasks m

which language is used purposefully. We would therefore include only the relevant

components of language ability in the construct definition, and specify the skill elements as

characteristics of the tasks in which language ability is demonstrated.


1. Note to the teacher: This chapter deals with a number of important, related issues in

test design, which are relatively complex. For this reason, it is relatively long. One

way of teaching the material might be to assign each section to be read and worked

with individually. To this end, independent exercises and suggested readings are

provided after each section.

2. The test developer and test user may perform different functions. However, they may

be one and the same individual, as is typically the case for a classroom test developed

by a teacher. Or they might be different individuals, as with a test developed by

testing specialists for use by admissions officers in colleges and universities. Or the

test user might be one member of a team of test developers, as with a test developed

by a team of teachers with a testing specialist coordinating the effort.

3. McNamara ib) states tnat tnese two types oi iuiereiies aic uascu on different

hypotheses. The first type of inference, about individuals ability to perform TLU

tasks successfully, is based on what McNamara calls the strong performance

hypothesis, while he calls the second type, about the individuals ability to use

language, the weak performance hypothesis.

4. For stylistic reasons, from now on we will use the acronym TLU only where it is

essential for distinguishing TLU domains and tasks from test tasks. Thus, where the

meaning is clear from the context, we will refer simply to domain and task.

5. We would note here that the broader the TLU domain for which we are designing a

test, the more difficult it may be to design a useful test. This is because the problem of

sampling from the domain of TLU tasks becomes more difficult as ihe domain of

TLU tasks becomes larger and hence more diverse. Thus, with a very large, diverse

domain, both authenticity and the validity of inferences about test takers ability to

perform in the TLU domain may be compromised by practical restrict ions on the

number of test tasks that can be included in the test .

6. In keeping with our stylistic decision, rather than using the complete terms real-life

TLU domain, language instructional TLU domain, we will refer to these simply as

real-life domain and language instruct ional domain

7. We recognize that .our view that specific components of language ability need to be

delineated and clearly defined if the construct definition is not universally held among

language testers. Indeed, a fairly large number of language testers view language

ability as a holistic, unitary ability, and thus do not attempt to delineate different

components in the way they assess this. The most prominent example of this is the

view which. has informed the ACTFL Oral Proficiency Interview, and its predecessor,

the FSI oral proficiency interview (Lowe 1988). Suffice it to say that the authors do

not agree with this view, for reasons that have been widely discussed in the research

literature, and refer those readers who are interested in this debate to the relevant

references at the end of this section.

8. At present the role of strategic competence in language use, and hence in language

test performance, is a relatively new area of research, and we cannot be more specific

about it. However, we believe tnat tuis is a promising approach m researching the

ways that strategies work.

9. We would point out that allowing test takers the choice ot taking tasks with different

topical content introduces another potential source of inconsistency across test tasks,

and that the. test developer must take appropriate steps to determine the extent to

which these tasks are equivalent and of comparable levels of difficulty.