You are on page 1of 43

VIETNAM NATIONAL UNIVERSITY, HANOI

COLLEGE OF FOREIGN LANGUAGES


DEPARTMENT OF POST GRADUATE STUDIES
-----------------------

HOÀNG VĂN SÁU

A STUDY ON THE VALIDITY OF END-TERM ACHIEVEMENT TESTS


ON ENGLISH GRADE 12, HIGH SCHOOLS IN NORTHERN VIETNAM

NGHIÊN CỨU TÍNH HIỆU LỰC CỦA CÁC BÀI KIỂM TRA CUỐI KỲ
MÔN TIẾNG ANH LỚP 12 TẠI MỘT SỐ TRƯỜNG THPT Ở MIỀN BẮC
VIỆT NAM

M.A THESIS

FIELD: METHODOLOGY
CODE: 60 14 10

HA NOI - 2009
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POST GRADUATE STUDIES
-----------------------

HOÀNG VĂN SÁU

A STUDY ON THE VALIDITY OF END-TERM ACHIEVEMENT TESTS


ON ENGLISH GRADE 12, HIGH SCHOOLS IN NORTHERN VIETNAM

NGHIÊN CỨU TÍNH HIỆU LỰC CỦA CÁC BÀI KIỂM TRA CUỐI KỲ
MÔN TIẾNG ANH LỚP 12 TẠI MỘT SỐ TRƯỜNG THPT Ở MIỀN BẮC
VIỆT NAM

M.A THESIS

FIELD: METHODOLOGY
CODE: 60 14 10
SUPERVISOR: DR. HA CAM TAM

HA NOI - 2009
- v -

TABLE OF CONTENTS

Page
CANDIDATE’S STATEMENT i
ACKNOWLEDGEMENTS ii
ABSTRACT iii
LIST OF TABLES iv
Chapter 1: INTRODUCTION 1
1.1. Rationale of the study 1
1.2. Scope of the study 2
1.3. Aims of the study 2
1.4. Research questions 2
1.5. Methods of the study 3
1.6. Organization of the study 3
Chapter 2: LITERATURE REVIEW 5
2.1. The relationships of language testing with teaching and learning 5
2.2. Objective testing 6
2.3. Achievement tests 7
2.3.1. Definitions 8
2.3.2. Final achievement tests 8
2.4. Test specification 9
2.5. Testing language components 10
2.5.1. Tests of grammar and usage 10
2.5.2. Test of vocabulary 10
2.5.3. Test of phonology 11
2.6. Validity of a test 11
2.6.1. Definitions and types of validity 11
2.6.2. Content validity of a test 12
2.6.3. Construct validity of a test 14
2.7. Objectives and Syllabus contents of English grade 12 15
- vi -

2.7.1. Objectives of English grade 12 15


2.7.2. Syllabus contents of English grade 12 16
2.8. Recommended test specification of final achievement tests, English
21
grade 12
2.9. Components’ contents of end-term achievement tests, English grade 12 22
2.9.1. Components’ contents of the 1st term achievement tests 22
2.9.2. Components’ contents of the 2nd term achievement tests 23
Chapter 3: THE STUDY 24
3.1. Research design 24
3.1.1. Research questions 24
3.1.2. Informants 24
3.1.3. Data description 25
3.2. Analytical framework 25
3.3. Findings and discussion 26
3.3.1. Content validity of test samples’ components 26
3.3.1.1. Content validity of phonetic items 26
3.7.2.2. Content validity of grammar test items 27
3.7.2.3. Content validity of vocabulary items 28
3.3.2. Construct validity of the test samples 28
3.3.2.1. Construct validity of phonetic test items 29
3.3.2.2. Construct validity of grammar test items 30
3.3.2.3. Construct validity of vocabulary test items 32
Chapter 4: CONCLUSION 34
4.1. Conclusion 34
4.2. Implications 34
4.3. Limitations and suggestions for further research 35
REFERENCES 37
APPENDIX 38
- iv -

LIST OF TABLES

Table 1: Syllabus contents of English grade 12 17

Table 2: The recommended specification of the end-term achievement tests 21

Table 3: Components’ contents of 1st term achievement tests 22

Table 4: Components’ contents of 2nd term achievement tests 23

Table 5: Content validity of test samples’ components 26

Table 6: Construct validity of the test samples 28


- 1 -

CHAPTER 1: INTRODUCTION

1.1. Rationale of the study

In several decades recently, English language testing and evaluation has received a
great interest from both educators, researchers worldwide. In Vietnam, for its important
roles in educational field, English testing and evaluation has been focused in universities
and educational institutions through researches, Master of Arts theses, doctoral theses in
methodology, most of which aim to evaluate reliability and validity, the essential and most
important characteristics of a test.
The raising interest towards English testing can be only explained by its importance
to English teaching and English learning. For English teaching, testing evaluation helps
teachers check again the effect of the teaching procedure, from which they could consider
the contents and techniques used in teaching. On the other hand, through testing, students
can adjust the learning process themselves in order to get better study results.
There are a number of previous researches at College of Foreign Languages –
Vietnam national University that carried out on testing in terms of validity of tests. For
instance, Vu, Ba Linh (2006); Nguyen, Thi Mai Phuong (2008); Tran, Thi Hieu Thuy
(2008); Le Thuy Linh (2004); Nguyen, Thi Bich Hong (2008), etc. All of these tests are at
college and university research area. However, we recognized that there is not any study
about validity of tests at high schools. The interested research topics are often about using
language skills, techniques in English teaching and learning. For example, Lam, Thi Thu
Thuy (2008); Đậu, Duy Lịch (2007); Nguyễn Thị Nguyệt (2007), etc. This raises a
question whether or not high school tests have reliability and validity. And if so, how could
they be evaluated?
One important thing when we mention to the testing and evaluation is the subjective
factor of the test-makers. Popularly, the tests are written without considering carefully
among the contents and objectives of the course and the content and construct of the tests.
This leads to the fact that many tasks students have to do in the tests do not exist in the
course contents or the test items are unfamiliar or far too difficult to students. Clearly,
those tests are said to be lack of reliability as well as validity, the most important and
essential measurement qualities of a test. This can be shown significantly through end-term
achievement tests which examine students‟ achievements after a term or a course.
- 2 -

For the scope of this research, the end-term achievement tests on English grade 12 at
high schools in Northern provinces of Vietnam have been collected and analyzed. Due to
the time limitation and research conditions, the end-of term achievement tests that have
been done and scored by students cannot be collected. That the reason why the reliability
of those tests was not chosen to investigate in this study. Only the validity in terms of
content validity and construct validity were taken into consideration.
From the above reason, the author is encouraged to conduct this study entitled “A
Study on the Validity of End-term Achievement Tests on English Grade 12, High
Schools in Northern Vietnam” with the desire of finding out how valid these tests are.
Furthermore, the writer hopes that the findings of the study can be applied to improve the
current testing in high schools. It is also intended to encourage both teachers and learners
in the teaching and learning process and to be the valuable source of reference for test
designers.

1.2 Scope of the study

Due to the limitation of time and research conditions, the author doesn‟t have the
ambition to cover all the aspects of a good achievement test like reliability, validity,
discrimination, backwash effects...etc. This study will mainly focus on the construct
validity and content validity of the end-term achievement tests on English grade 12 at high
schools of some provinces in Northern Vietnam in school years of 2008 - 2009. The study
will give the findings about construct validity and content validity of those achievement
tests and give suggestion to improve those tests as well as suggestions for further studies.

1.3. Aims of the study

The major aims of the study is to evaluate the validity of the end-term achievement
tests on English grade 12 at high schools of some provinces in Northern Vietnam in school
years of 2008-2009 with a special focus on those tests‟ construct validity and content
validity. The specific aims of the study are:
 To study and evaluate the construct validity and content validity of those
end-term achievement tests; and
 To give out the strengths and weaknesses of the tests.
- 3 -

1.4. Research questions

In order to achieve these goals, the study is carried out to the answer the following
research questions:
1- Do the end-term achievement tests on English grade 12 at high schools in
some Northern Vietnamese provinces possess content validity?
2- Do those tests possess construct validity?

1.5. Methods of the study

This study is a combination of both quantitative and qualitative approaches. First, a


quantitative method was employed on the data collection from 10 end-term achievement
tests on English grade 12 of high schools in some northern provinces of Vietnam. The
number of each language component of a test that possessed the content validity and
construct validity has been counted and changed into percent.
Then from the quantitative statistics, qualitative method was employed to interpret
the data into the meanings of test samples and their components in terms of content
validity and construct validity.
1.6. Organization of the study
The thesis is organized into four major chapters:
Chapter 1 is the introduction that presents such initial information as the rationale,
aims, methods, research questions and the organization of the study.
Chapter 2 reviews all related literature that provides the theoretical basis for
language testing and language evaluation. First, the relationships of language testing with
teaching and learning and objective testing are presented. Then, the achievement tests; test
specification; multiple choice questions and testing language components are discussed
carefully. Next, the most important theoretical part, validity in terms of content validity
and construct validity are deeply taken into consideration. Last parts are spent for
objectives an syllabus design of English grade 12; Recommended test specification of end-
term achievement tests on English grade 12 and components‟ contents of end-term
achievement tests.
Chapter 3 is the main part of the study which shows the research design containing
research questions, data description, informants and analytical framework. Next, data
- 4 -

analysis of construct validity and content validity is discussed. Finally, the findings about
content validity and construct validity of the test samples are laid out.
Chapter 4 offers the conclusions that make clear the research questions. Some
implications are suggested to improve end - term achievement tests in terms of their
construct validity and content validity. The limitations and directions to further research
are also mentioned in this final chapter.
- 5 -

CHAPTER 2: LITERATURE REVIEW

This chapter provides an overview of the theoretical background of the research.


Firstly, it discusses about the relationships of language testing with teaching and learning
process. Then, the achievement tests; test specification and testing language components
are discussed carefully. Next, the most important theoretical part, validity in terms of
content validity and construct validity are deeply taken into consideration. Last parts are
spent for objectives and syllabus design of English grade 12; recommended test
specification of end-term achievement tests on English grade 12 and components‟ contents
of end-term achievement tests.

2.1. The relationships of language testing with teaching and learning

Teaching, learning and testing are interrelated closely to each other, that the
existence and changes of this factor may have considerable effects on other factors. Among
these three factors, perhaps language testing itself has the strongest and clearest effects on
teaching and learning process. Heaton (1988:5) had the same idea that “Both testing and
teaching are so closely interrelated that it is virtually impossible to work either field
without being constantly concerned with the other”. Heaton (1988:5) also pointed out the
importance of testing to the learning process as “Tests may be constructed primarily as
devices to reinforce learning and motivate the students or as a mean of assessing the
students’ performance in the language”.
Davies (1996:5) also described the importance of language testing as “Properly
made English tests can help create positive attitudes toward instruction by giving students
a sense of accomplishment and a feeling that the teacher’s evaluation of them matches
what he has taught them. Good English tests also help students learn the language by
acquiring them to study hard, emphasizing course objectives, and showing them where
they need to improve”.
In term of the teaching field, testing help teachers evaluate how learners have
achieved the target language knowledge and language skill. Bachman (1990:55) shared this
point of view when he stated that the fundamental use of testing in an educational program
is to provide information for making decisions, that is, to evaluate. However it is not a
simple thing for teachers to receive exact, reliable and valid testing from different test-
- 6 -

takers, who have different interests, attitudes, and different background knowledge of the
target language. Many teachers feel disappointed with the test-takers‟ results when they
don‟t satisfy the teachers‟ desires. One of the acceptable solutions is that some easy test
items may be given to encourage weaker students and other more and more difficult items
for better students.
For learners, testing helps them find out their weak points and strong points, from
which they may develop the most suitable learning strategies themselves; testing may
motivate students to maintain their achievements or to gain better ranks in the class.
Nevertheless, testing can bring positive effects to students in case its difficult level is too
high or too low, that makes students lose their interest or get bored with the learning
process. Hughes (1989:1) dealt with the effect of testing on teaching and learning as
backwash that may be beneficial or harmful with more focus on the harmful side of test.
According to him, in case the test content does not meet the objectives of the course, the
harmful backwash then appears and it proves the thinking that teaching and testing do not
have been related to each other. He also presented a lively example, that is, a writing skill
is tested only by multiple choice items in which learners only concentrate on practicing
such items rather than practicing the skill of writing itself.
In summary, testing plays a very important role to teaching and learning and vice
versa. For teachers, a good test can help them evaluate their teaching procedure as well as
their students‟ achievement more effectively and, help to eliminate the harmful backwash
that the test may have.

2.2. Objective testing

There are many types of language tests. Hughes (1989:9) classified them according
to their testing purposes, namely, proficiency tests, achievement tests, diagnostic tests and
placement tests. Besides, on the basis of manner in which tests are scored, they are divided
into objective and subjective testing.

Of all testing types, objective tests are chosen to discuss about. The reason of which
is, in the recent years, most of English written tests at high schools in Northern Vietnam
have been designed basing on this testing approach. And the end-term English
achievement tests, grade 12, school year 2008-2009 are not the exception.
- 7 -

According to Davies et al., (1999:132), objective test is a test in which all the items
are objectively scored. In an objective test, correct responses are clearly specified, and
markers are not required to make judgments.
Heaton (1998:26) stated that objective tests are frequently criticized on the grounds
that they are simpler to answer than subjective tests. However, items in an objective test
can be made just as easy or difficult as the test designer wishes. Heaton (1998:26) noticed
that some scholars criticized objective tests of the multiple-choice for their guessing
encouragement. Nevertheless, Heaton pointed out that four or five alternatives for each
item are sufficient to reduce the possibility of guessing. He added that test-takers hardly
make wild guesses; most base their guesses on partial knowledge.
Personally speaking, objective testing is one of effective ways of evaluating the
study results in English. It not only tests the communicative skills but checks the language
knowledge. In an objective test there is the larger number of grammar, vocabulary and
phonology that can generally be included than those in a subjective test. What is more,
objective tests can be scored mechanically since they often have only one correct answer.
It is the fact that objective tests can be marked by computer that is one important reason for
testing large numbers of test-takers.
For test-makers, however, mastering the types of objective tests and designing them
is rather difficult requirement. In order to design a good objective test, the test-takers have
to grasp testing techniques and prepare a plentiful testing bank.
In objective testing, multiple-choice questions are among the most popular types of
technique to be used. It then follows by other types of techniques, such as matching items,
supply items, true/false questions, etc.
In short, a good classroom test should contain equally both subjective items and
objective items to enhance its validity and reliability as well as to ensure its language
knowledge and its language skills.

2.3. Achievement tests

There are four main types of tests, namely, achievement tests, proficiency tests,
aptitude tests, and diagnostic tests. In this study, due to the scope of the research is to
evaluate the end-term achievement tests, English grade 12, only the first test type in term
of achievement test will be discussed.
- 8 -

2.3.1. Definitions
Achievement tests, which are very popularly used in secondary schools and high
schools, are very important tool to help evaluating students nowadays. There are a number
of ways in which achievement tests are defined, among the most remarkable of which are:
According to Davies et al, (1999:2): “An achievement test is an instrument designed
to measure what person has learned within or up to a given time. It is based on a clear and
public indication of the instruction that has been given. The content of the achievement
tests is a sample of what has been in the syllabus during the time under scrutiny and as
such they have been called parasitic on the syllabus”.
Hughes (1989:10) claimed that “Achievement tests are directly related to language
courses with the purpose of establishing how successful individual students, group of
students, or the courses themselves have been in achieving objective”
It then follows by McNamara (2000:06) that “Achievement tests accumulate
evidence during, or at the end of a course of study in order to see whether and where
progress has been made in terms of the goals of learning. Achievement tests should
support the teaching to which they relate.”
Heaton (1998:172) agreed with the above attitudes and he claimed that these tests
are based on what the students are presumed to have learnt - not necessarily on what they
have actually learnt nor on what have actually been taught.
To sum up, a good achievement test should cover the specific learning and teaching
contents that have been previously used.
2.3.2. Final achievement tests
Hughes (1989:10) claims that final achievement tests are those administered at the
end of a course of study. They may be issued by ministries of education, official examining
boards, or by members of teaching institutions. Clearly the content of these tests must be
related to the courses which they are concerned, but the nature of this relationship is a
matter of disagreement among some language testers.
It is approved by some testing experts that the content of a final achievement test
should be based directly on a detailed course syllabus or on the books and other materials
used. This has been referred to as the syllabus-content approach. And since the test only
contains what it is thought that the students have actually encountered, and thus can be
considered as a fair test. However, the disadvantage of this type is that if the syllabus is
- 9 -

badly designed, or the books and other materials are badly chosen, then the results of tests
can be very misleading. It leads to the fact that successful performance on the test may not
truly reflect the achievement of course objectives.
The second approach is to design the test content directly on the objectives of the
course, which has a number of advantages. Firstly it forces course designers to elicit course
objectives. Secondly, student can show how far they have achieved those objectives. Tests
based on course objectives work against the perpetuation of poor teaching practice, a kind
of course-content-based test, almost as if part of a conspiracy fail to do. The test content
based on course objectives is believed to be more preferable and is said to provide more
accurate information about individual and group achievement, and seem to promote a more
beneficial backwash effect on teaching. Hughes (1995:11) approved of the later approach
by arguing that it will provide more accurate information about individual and group
achievement, and it is likely to promote a more beneficial backwash effect on teaching.
2.4. Test specification
It is no doubt that test specifications play an important and essential part in test
construction and evaluation.
Alderson, Clapham and Wall (1995:9) claim that test specifications provide the
official statement about what the test tests and how the test tests it. These scholars
demonstrate that the specifications are the blueprint to be followed by test and item writer,
and they are also essential in the establishment of the test‟s construct validity.
Furthermore, Alderson, Clapham and Wall (1995:10) add that test specifications are
not only needed by just an individual but a range of people. They are needed by (i) Test
constructors to produce the test; (ii) Those responsible for editing and moderating the test;
(iii) Those responsible for or interested in establishing test‟s validity; and (iv) Admission
officers to make a decision on the basis of test scores.
It is then followed by the view of McNamara (2000:31) that test specifications are a
recipe or blueprint for test construction and they will include information on such matters
as the length and the structure of each part of the test, the type of materials with which
candidates will have to engage, the source of such materials if authentic, the extent to
which authentic materials may be altered, the response format, the test rubric, and how
responses are to be scored.
- 10 -

Personally, the users of test specifications may have different needs, so writers of
specifications should remember that what is suitable for some audience may be quite
unsuitable for the others.
2.5. Testing language components
Heaton (1998:9) and many other linguists isolated the language components as three
sections on grammar and usage; vocabulary (concerned with word meanings, word
formation and collocations); and phonology (concern with phonemes, stress and
intonation)
2.5.1. Tests of grammar and usage
According to Heaton (1998:9), “these tests measure students‟ ability to recognize
appropriate grammatical forms and to manipulate structures.”
While discussing about the role of grammar testing, Hughes (2003:172) explained
that there was the time when control of grammatical structures was seen as the very core of
language ability and it would have unthinkable not to test it. However, times have changed
with a shift towards the view that since the language skills that are usually of interest, then
it is these skills which should be tested directly, not the abilities that seem to underlie them.
Hughes (2003:173) emphasized that “…it has to be accepted that grammatical
ability, or rather the lack of it, sets limits to what can be achieved in the way of skills
performance. The successful writing of academic assignments, for example, must depend
to some extent on command of more than the most elementary grammatical structures.”
In a grammar test, there commonly have the following items: multiple-choice items,
error-recognition items, rearrangement items, completion items, transformation items,
items involving the changing of words, broken sentence items, pairing and matching items,
combination items, and addition items.
2.5.2. Test of vocabulary
It is obvious that vocabulary is the most essential and important part of any language
and with any other language components and language skills as phonetics and phonology,
grammar, reading, speaking, writing and listening. Heaton (1998:9) stated that “A test of
vocabulary measures students‟ knowledge of the meaning of certain words as well as the
patterns and collocations in which they occur. Such a test may test their active vocabulary
(the words they should be able to use in speaking and in writing) or their passive
- 11 -

vocabulary (the words they should be able to recognize and understand when they are
listening to someone or when they are reading”.
Vocabulary tests often use some items, namely, multiple-choice, associated words,
gap-fill, matching items, word formation, items involving synonyms, rearrangement items,
and completion items.
2.5.3. Test of phonology
Heaton (1998:9) claimed that test items design to test phonology might attempt to
assess the following sub-skills: ability to recognize and pronounce the significant sound
contrasts of a language, ability to recognize and use the stress patterns of a language, and
ability to hear and produce the melody or patterns of the tunes of a language (i.e. the rise
and fall of the voice).
There are often several items in a phonology test, namely, multiple-choice, syllable
stress, word stress, sentence stress, ordering task and note-taking.

2.6. Validity of a test

There are a number of qualities of a good test, namely, reliability, validity,


practicality, interactiveness, impact, or authenticity, etc. The author noticed that the most
important consideration in designing and evaluating a language test is its usefulness, which
is strongly denoted by its validity. In this part, definitions and two subtypes of validity in
terms of construct validity and content validity will be investigated.

2.6.1. Definitions and types of validity

According to Davies et al., (1999:221): “Validity is the quality which most affects
the value of a test, prior to, though dependent on, reliability. A measure is valid if it does
what it is intended to do, which is typically to act as an indicator of an abstract concept
(for example height, weight, time, etc.) which it claims to measure. The validity of a
language test therefore is established by the extent to which is succeeds in providing an
accurate concrete representation of an abstract concept (for example proficiency,
achievement, aptitude).”
Two famous scholars, Heaton and Hughes shared the same idea about validity.
Heaton (1988:159) provide a very concise notion of validity as “the validity of a test is the
- 12 -

extent to which it measures what it is supposed to measure” while Hughes (1989:22) stated
that “a test is said to be valid if measures accurately what it is intended to measure”
A test is said to be valid if it measures what it is intended to measure. In other words,
the test may be valid for some purposes but not for others. For instance, if the test is
designed to test ability of comprehensive reading, then it is valid if it contains reading test
items like multiple choice, matching, C-tests, etc. But if the test is structured with
questions of grammar, it then cannot be considered valid.
Validity is classified into such subtypes as: construct validity, content validity, face
validity, criterion-related validity. Among them, the two subtypes of validity in terms of
construct validity and content validity will be discussed in the next two parts.

2.6.2. Content validity of a test

According to Harrison (1983:11): “Content validity is concerned with what goes into
the test. The content of a test should be decided by considering the purpose of the
assessment, and then drawing up a list known as a content specification”.
Henning (1987:94) claimed that content validity is concerned with “whether or not
the content of the test is sufficiently representative and comprehensive for the test to be a
valid measure of what is supposed to measure”. For him, there is not always an exhaustive
test but it must be selected in content.
It then follows by Shohamy‟s idea (1985:74) that “A test is described to have
content validity if it can show the test-takers’ already-learnt knowledge. People normally
compare the test content to the table of specification. Content validity is said to be the most
important validity for classroom tests”
Content validity is the first form of evidence that relates to the content of the test.
The test would have content validity only if it included a proper sample of relevant
structures which will depend upon the purpose of the test. For example, an achievement
test for intermediate learners should not contain just the same set of structures as one for
advanced learners. To ensure about the content validity of a test, a specification of the
skills or structure is needed, that it is meant to cover. Such specification should be made at
a very early stage in test construction. (Hughes, 2003:26)
Hughes (2003:26) suggested that not everything in the specification will always be
expected to appear in the test because too many things for all of them to be in a single test.
- 13 -

However, he claimed that this specification will give the basis for testers when making a
principled selection of elements for inclusion in the test.
This is a non-statistical type of validity that involves “the systematic examination of
the test content to determine whether it covers a representative sample of the behaviour
domain to be measured” (Anastasi & Urbina, 1997 p. 114). A test has content validity built
into it by careful selection of which items to include (Anastasi & Urbina, 1997). Items are
chosen so that they comply with the test specification which is drawn up through a
thorough examination of the subject domain.
Sharing attitudes with the above researchers, Heaton (1998:160) stated that “This
kind of validity depend on a careful analysis of the language being tested and of the
particular course objectives. The test should be so constructed as to contain a
representative sample of the course, the relationship between the test items and the course
objectives always being apparent”. He also pointed out that when constructing a test, the
tester should first draw up a table of test specifications, describing in very clear and precise
terms the particular language skills and areas to be included in the test. Heaton (1998:160)
took an example in which the test or sub-test being constructed is a test of grammar, then
each of the grammatical areas should then be given a percentage weighting, for instance,
the future simple 10 percent, uncountable nouns 15 percent, relative pronouns 10 percent,
etc.
In other words, this approach measures the degree to which the test items represent
the domain or universe of the trait or property being measured. In order to establish the
content validity of a measuring instrument, the researcher must identify the overall content
to be represented. Items must then be randomly chosen from this content that will
accurately represent the information in all areas. By using this method the researcher
should obtain a group of items which is representative of the content of the trait or property
to be measured.
To conclude, content validity in a certain extent, is more specific than construct
validity, and identifying the universe of content is not an easy task. It is, therefore, usually
suggested that a panel of experts in the field to be studied be used to identify a content
area. For example, in the case of researching the knowledge of teachers about a new
curriculum, a group of curriculum and teacher education experts might be asked to identify
the content of the test to be developed.
- 14 -

2.6.3. Construct validity of a test

Davies et al (1999:33) defined construct validity of a test as follows: “the construct


validity of a language test is an indication of how representative it is of an underlying
theory of language learning. Construct validation involves an investigation of the qualities
that a test measures, thus providing a basis for the rationale of a test”
These above authors also suggested that there are two aspects of construct
validation: theoretical and empirical, both of which are concerned with the production of
evidence of arguments to support the inferences that are made about candidates on the
basis of their test performance. Construct validity is traditionally examined by determining
the relationship between the empirical (patterns of scores on the test) and the theoretical
(proposed explanatory concepts), so, for example factor analysis may be undertaken to
identify the number of factors (or constructs) in the test data and their relationship with one
another.
It seems that construct validity is the most difficult concept and is considered to be a
superordinate form which is contributed by internal and external validity. Henning
(1987:98) argued that “While construct validity is empirical in nature because it involves
the gathering of data and the testing of hypotheses, unlike concurrent and predictive
validity, it does not have any one particular validity coefficient associated with it”. He
added that the purpose of validating construct is to make sure that the underlying
theoretical constructs being measured are themselves valid. According to him, construct
validation usually begins with a psychological construct that is part of a formal theory,
which enables certain predictions about how the construct variable will behave or be
influenced under specified conditions, under which the construct is then tested.
In his concept, Hughes (1995:26) stated that “A test, part of a test, or a testing
technique is said to have construct validity if it can be demonstrated that it measures just
the ability which is supposed to measure”. He believed that the word “construct” is
denoted to any underlying ability (or trait) which is hypothesized in a theory of language
ability. Hughes also gave an example that the ability to read involves a number of sub-
abilities, such as the ability to guess the meaning of unknown words from the context in
which they are met.
- 15 -

As suggested by Alderson, Clapham & Wall (2000, pp.183-185) one way of


assessing the construct validity of a test is to correlate its various test components with
each other. On the other hand, they agreed that in a well-designed test, the correlations
between each subtest and the whole test can be expected to be higher, since the overall
score is taken to be a more general measure of language ability than each individual
component score.
To sum up, construct validity of a test can be evaluated by examining the testing
techniques which are used in this test to consider whether those techniques can be able to
measure the testees‟ ability of understanding and using language components, such as
phonetics, structures and vocabulary, etc.
2.7. Objectives and Syllabus contents of English grade 12
2.7.1. Objectives of English grade 12
English textbook grade 12, that is “Tiếng Anh 12” is a follow-up of the English
textbook grade 10 and grade 11 which are theme-based and topic-based composed. English
textbook grade 12 has been officially applied in high schools from the school year 2008-
2009. The thematic and topical system is the basis for the shaping and developing the
language competence. The language components like phonetics, vocabulary and grammar,
etc. are introduced to shape and develop the students‟ language competence. For teachers,
this design model helps them actively carry out communicative activities according to
themes in order to form and develop the communicative skills and suitable language
knowledge which meets the students‟ needs, their tastes, and their different proficiency
levels.
According to BỘ GIÁO DỤC VÀ ĐÀO TẠO (2006:25), The English textbook
grade 12 aims at providing students the language knowledge, especially the basic, modern
and systematic English language, which is the background for shaping appropriate
communicative skills to students‟ ages. At the end of this grade, students are able to use
learnt English knowledge to practice the four skills:
- Listening: students are able to (i) listen comprehensively main contents as
well as detailed contents of about 180 to 200-word monologues/ dialogues
in the learnt domain contents of the textbook; (ii) understand the text at the
normal reading/ speaking speed.
- 16 -

- Speaking: students are able to (i) ask, answer, and speak to topic-related
contents in the textbook; (ii) carry out basic communicative functions, such
as showing personal attitudes, talking about demands and hobbies,
explaining reasons, etc.
- Reading: students are able to (i) read comprehensively main contents as
well as detailed contents of about 180 to 200-word texts/passages in the
learnt domain contents of the textbook; (ii)distinguish the main ideas and
supplement ones; and (iii) use main ideas to summarize the texts/passages.
- Writing: students are able to write 130 to 150-word text basing on samples
and/or suggestions that is about topic-related contents to serve the simple
personal communicative demands and social relations.

2.7.2. Syllabus contents of English grade 12

There are six main themes, sixteenth units and six revision units in English textbook
grade 12. Each unit is equivalent to a specific topic. Each theme, which involves some
topics, requires a certain communicative competences. At the end of each unit, a part of
language focus will summarize the pronunciation and the grammar points of that unit.
According to BỘ GIÁO DỤC VÀ ĐÀO TẠO (2006:62) and Hoang, Van Van et al.,
(2008:10), the syllabus design of the English textbook grade 12 is specified as follows:
- 17 -

Themes/Topics Attainment targets Language focus


1. YOU AND ME Speaking: Students will be able to Pronunciation:
- Home life - Talk about home life, lifestyles; - The pronunciation of the ending „s‟ and
- Cultural diversity household responsibilities and chores „ed‟
- Ways of - Talk about cultural diversity such as - Stress in two-syllable words
socializing etiquettes, ways of socializing, giving Grammar:
complements, etc. - Tenses: past simple, past progressive,
- Understand different points of view. past perfect, present simple, present
- Ask for and give information progressive, present perfect, present
Listening: Students will be able to: perfect progressive
Listen to a monologue/ dialogue of 180- - Reported speech: statements, questions
200 words for general of specific Vocabulary:
information - Words to talk about home life: family,
Reading: Students will be able to family relationship, daily routine, leisure
Reading a passage of 280-320 words for activities
general or specific information - Words to talk about cultural diversity:
Writing: Students will be able to attitudes toward love and marriage,
- Write an informal letter of 130-150 wedding ceremony, typical features of a
words about college life using suggested culture
word cues or idea prompts - Words to express ways of socializing:
- Write about family rules of 130-150 how to communicate in different
words using suggested word cues or idea cultures, how to use the telephone, how
prompts to apologize and express regret
- Write about a typical product of a
culture of 130-150 words using
suggested word cues or idea prompts
- 18 -

2. EDUCATION Speaking: Students will be able to Pronunciation:


- School education - Talk about school education system - Stress in three-syllable words
system - Talk about the application process to a - Stress in more than three-syllable
- Higher education tertiary institution in Viet Nam words
- Future jobs - Express opinions on a part-time/future - Weak/strong forms of some
job conjunction & prepositions
- Talk about job application Grammar:
Listening: Students will be able to: - Passive voice
Listen to a monologue/ dialogue within - Conditional sentences
200-250 words for general or specific - Relative clauses
information Vocabulary:
Reading: Students will be able to - Words to talk about education system
Read a passage within 250-300 words from primary to higher education: levels
for general or detailed information of education, subjects, ways of learning,
Read a passage within 250-300 words types of school, examinations
and scan for specific information - Words to talk about tertiary application
Writing: Students will be able to procedure: filling in an application form,
- Write a formal letter of request requirements for university entrance,
- Write about school education system types of higher education, certificates
based on word cues or idea prompts - Words to talk about types of jobs, job
interview and job application
3. COMMUNITY Speaking: Students will be able to Pronunciation:
- Economic - Talk about economic changes - Strong & weak forms of auxiliaries
reforms - Talk about life in the future - Contracted forms of auxiliaries
- Life in the future Listening: Students will be able to: (continued)
Listen to a monologue/ dialogue within Grammar:
200-250 words for general or specific - Preposition of time, places
information - Articles (definite and indefinite)
Reading: Students will be able to - Adverbial clauses of concession
Read a passage within 250-300 words Vocabulary:
for general or detailed information - Words to talk about economic reforms:
Read a passage within 250-300 words policies, changes, renovations, measures,
and scan for specific information effects
Writing: Students will be able to - Words to describe statistics in
- Write a report based on given education, health care, industry,
information agriculture
- Write about life in the future word cues - Words to predict about life in the
or idea prompts futures: living conditions, technology,
means of transportation, life expectancy
- 19 -

4. NATURE AND Speaking: Students will be able to Pronunciation:


ENVIRONMENT - Talk about natural features of deserts - Full and contracted forms of auxiliaries
- Deserts and desert life - Rhythm
- Endangered - Explain reasons (why some kinds of Grammar:
species trees and animals can exist in deserts) - Modal verbs: may, might, must,
- Talk about endangered animals and mustn‟t, needn‟t
how to protect and save endangered - So, but, however, therefore
species Vocabulary:
Listening: Students will be able to: - Words to talk about deserts: typical
Listen to a monologue/ dialogue within features, how they are formed, how
200-250 words for general or specific animals and plants live
information - Words to talk about endangered
Reading: Students will be able to species: types, living conditions,
Read a passage within 250-300 words conservation, disappearance, extinction,
for general or detailed information measures to save endangered species
Read a passage within 250-300 words from extinction
and scan for specific information
Writing: Students will be able to
- Write a description of the main features
of a desert
- Write about measures to protect
endangered species and possible results
based on word cues or a guideline
- 20 -

5. Speaking: Students will be able to Pronunciation:


RECREATION - Express opinions preferences - Rhythm
- Books - Talk about a book - Elision
- Water sports - Talk about kinds of water sports and - Linking
- The 22nd SEA how some types of water sports are Grammar:
GAMES played - Modals in passive voice
- Talk about sport events and results of - transitive and intransitive verbs
SEA Games - Comparative + and + comparative
- Talk about reading habits - The comparative + the + comparative
Listening: Students will be able to: Vocabulary:
Listen to a monologue/ dialogue within - Words to talk about books: kinds of
200-250 words for general or detailed books, characters, writers, reading habits
information - Words to talk about water sports: types
Reading: Students will be able to and history of water sport, and how they
Read a passage within 250-300 words are developed and played
for general or detailed information - Words to talk about how SEA Games
Read a passage within 250-300 words are prepared
and scan for specific information
Writing: Students will be able to
- Write a book report
- Write about a description of a game or
sport using word cues or idea prompts
6. PEOPLES & Speaking: Students will be able to Pronunciation:
PLACES - Talk about international organizations - The falling tune
- International and their activities - The rising tune
organizations - Talk about the roles of women in - The rising-falling tune
- Women in society Grammar:
society - Express agreement and disagreement - Phrasal verbs: 2-or-3 word verbs
- Associations of Listening: Students will be able to: - Adverbial clause of time: when, while,
South East Asian Listen to a monologue/ dialogue within as soon as, since, before, after
Nations (ASEAN) 200-250 words for general or detailed Vocabulary:
information - Words to talk about international
Reading: Students will be able to organizations: types, aims, functions,
Read a passage within 250-300 words activities
for general or detailed information, scan - Words to describe women‟s roles at
for specific information home an in the society
Writing: Students will be able to
- Describe information from a chart
- Write a letter of recommendation
within 130-150 words using word cues
or idea prompts
Table 1. Syllabus design of English grade 12
- 21 -

2.8. Recommended test specification of the final achievement tests, English grade 12

There is no official regulation of designing the final achievement tests from


educational authorities, only recommended test specification of those tests is issued by
Ministry of Education and Training up to now. The high schools would rely on this
recommended test specification to design their own test so that their particular teaching
and learning conditions are taken into account.
Basically, the test specification of the end-term achievement tests and the 45-minute
tests are the same. However, the contents of the end-term achievement tests are more
complex and their requirements are more synthetic than the 45 minute tests. The end-term
achievement tests aim at evaluating the general knowledge of different themes that
students have learned in a term.
There are two end-term achievement tests for English grade 12, namely, the end-1st
term achievement test and the end-2nd term achievement test. Normally, the contents of the
first type will be the knowledge of themes 1, 2 and 3 (from unit 1 to unit 8) while the later
will be covered by themes 4, 5 and 6 (from unit 9 to unit 16). The contents of those tests
are closely related to the part Language Focus in the syllabus design (see table 3.1)
Nevertheless, from the experiences of the author as a teacher of English at high
schools and from the investigation of his colleagues‟ end-term achievement tests, the end-
2nd term achievement test often far more complex and synthetic than the one in the 1 st term
and its knowledge may be covered by what students have studied in the 1st term.
The recommended specification of the end-term achievement tests are designed as
follows:
Part Main skill focus Input Item type
Question 1: Phonetics
Pronunciation Stress patterns - Multiple choice
Stressed syllable Different pronunciation - Putting word into the
right column
- Matching
Question 2: Grammar, Vocabulary and Language functions
Vocabulary, Grammar Gapped sentences Multiple choice
and Language functions Short paragraphs Gap-fill
Short prompts Matching
Dialogue completion
- 22 -

Question 3: Reading
Reading for gist and Two short texts: about Comprehension
details 100-120 words question
Gap fill
Matching
True/false
Multiple choice
Question 4: Writing
Controlled writing Prompts: - Guided writing
questions, word(s), - Sentence building
guidelines - Sentence
transformation

Table 2. The recommended test specification of the end-term achievement tests, English grade 12

2.9. Components’ contents of end-term achievement tests, English grade 12


2.9.1. Components’ contents of the 1st term achievement tests
Language
Contents
components
- The pronunciation of the ending „s‟ and „ed‟
- Stress in two-syllable words; Stress in three-syllable words; Stress in more
than three-syllable words; Weak/strong forms of some conjunction &
Phonetics
prepositions
- Strong & weak forms of auxiliaries
- Contracted forms of auxiliaries (continued)
Tenses: past simple, past progressive, past perfect, present simple, present
progressive, present perfect, present perfect progressive
- Reported speech: statements, questions
Grammar
- Passive voice; Conditional sentences
- Relative clauses; Preposition of time, places; Articles (definite and
indefinite); Adverbial clauses of concession
Vocabulary and Words relating to: Home life; Cultural diversity; Ways of socializing;
language School education system; Higher education; Future jobs; Economic
functions reforms; Life in the future

Table3. Components’ contents of the 1st term achievement test


- 23 -

2.9.2. Contents of components of the 2nd term achievement tests


Language
Contents
components
- Full and contracted forms of auxiliaries; Rhythm; Elision; Linking; The falling
Phonetics
tune; The rising tune; The rising-falling tune
- Modal verbs: may, might, must, mustn‟t, needn‟t
- So, but, however, therefore; Modals in passive voice
Grammar - transitive and intransitive verbs; Comparative + and + comparative; The
comparative + the + comparative; Phrasal verbs: 2-or-3 word verbs; Adverbial
clause of time: when, while, as soon as, since, before, after
Vocabulary Words relating to: Deserts; Endangered species; Books; Water sports; The 22nd
and language SEA GAMES; International organizations; Women in society; Associations of
functions South East Asian Nations (ASEAN)

Table 4. Components’ contents of the 2nd term achievement tests


- 24 -

CHAPTER 3: THE STUDY

This chapter is the main part of the study. The research design of the study which
covers research questions, data description, informants and analytical framework will be
briefly presented at first. Secondly, data analysis for the study is taken into account. Lastly,
the author will give out the findings of study.

3.1. Research design

3.1.1. Research questions

The study is carried out to the answer the following research questions:
1. Do the end-term achievement tests on English grade 12 at high schools in
some Northern Vietnamese provinces possess content validity?
2. Do those tests possess construct validity?

3.1.2. Informants

The study was carried out at high schools of several provinces in northern Vietnam
where have the similar social and economical conditions. In the places where this study is
conducted, the social and economical conditions are at the medium level, the living
standard there are not very high. The demands of English are not as high as in big cities
like Hanoi or Hochiminh city. Therefore most of students in those places are not fully
aware of the important role of English to their future life. This is the reason why they seem
to be lack of English learning motivation and of English proficiency level. Especially, their
communicative skills are very weak. Their grammatical capacity, however, is better than
other language skills.
Moreover, the English teaching staffs in high schools are of dissimilar quality. Some
of them are originally Russian teachers and most of those teachers are deeply familiar with
tradition teaching methods, such as grammatical translation, P.P.P approach (Presentation -
Practice - Production). There are few teachers continuing higher studies. Besides, the
inadequate textbook system and the rather low level of applying informatic technology in
teaching has contributed considerably to the disadvantages in teaching English in the light
of communicative approach - the most popular teaching approach nowadays. All of these
things have certain impacts on testing in high schools.
- 25 -

3.1.3. Data description

The end term achievement tests on English grade 12 of high schools in several
Northern provinces of Vietnam are chosen to be the data sources of the research. With the
helps of the author‟ colleagues from different high schools, the 1st term and the 2nd term
test samples have been sent by emails or by post-office. Five 1st term test samples and five
2nd term test samples are chosen to investigate. To ensure their secret, objectiveness and
safety, five of 1st term test samples are numbered from Test 1 to Test 5 and five of 2 nd term
test samples are numbered from Test 6 to Test 10. Most of the sample tests are made by
teachers of English themselves but some of sample tests are designed by local departments
of education and training. For instance, Test 4 and Test 5. The time allowance of the test is
often from forty-five minutes to sixty minutes, depending upon the test item numbers and
the students‟ ability. Last thing to comment on, the test content of the 2nd term test samples
is far more complex than the test content of the 1st term ones.
3.2. Analytical framework
In order to attain the research‟ aims and research questions, this study was done in
the light of both quantitative and qualitative methods.
First, the author will base on the theoretical discussions on content validity of the
previous chapter to analyze content validity of the test samples. This part will mainly be
based on Brian K. Lynch‟s statements of content validity (2003:150). For him, content
validity will be examined by the judgments about the degree of match between the test
items and tasks and the ability to be tested. In other words, the match found between test
specifications and items produced from those specs is evidence of content validity. Content
validity of data will be examined by comparing the syllabus design of English grade 12
and the test content to see whether the test samples cover the components like phonetics,
grammar and vocabulary of the syllabus or not.
To achieve the construct validity of the tests, we will examine the test items‟
components (phonetics, grammar and vocabulary) to see whether their employed testing
techniques can check students‟ ability of understanding and using those language
components, then if so, they are said to have construct validity and vice versa. Factually,
there are often four subtests in each test: Phonetics, Grammar and Vocabulary, Reading
comprehension and Writing. The reading comprehension will be analyzed in vocabulary
- 26 -

field to show out that the techniques used are valid or not; and the writing questions will be
assessed in terms of their grammar structures to find out whether their testing techniques
are valid or not.
Then quantitative method was employed to collect data and consider the percent of
test items that achieve content validity and construct validity as well as the percent of test
items that could not reach content validity and construct validity. Finally, the qualitative
method is the major instrument to the interpretation of the meanings of the above percent
as data results.

3.3. Findings and discussions

3.3.1. Content validity of test samples’ components


From the comparison between the content of language components of the 1 st term
and 2nd term test samples and the tests‟ contents, the percent of the 10 test samples‟
components that achieved content validity will be shown as the following table:
Test samples Phonetics items Grammar items Vocabulary items
Test 1 50% 90% 99%
Test 2 50% 100% 100%
Test 3 50% 100% 100%
Test 4 100% 80% 100%
Test 5 100% 70% 100%
Test 6 15% 100% 100%
Test 7 30% 85% 100%
Test 8 30% 80% 100%
Test 9 15% 98% 100%
Test 10 30% 80% 100%

Table 5: Content validity of test samples’ components

3.3.1.1. Content validity of phonetic items

As can be seen from the above table, the numbers of valid test contents of the 1st
term test samples are higher than the ones of the 2nd term test samples. While Test 4 and
Test 5 cover 2 main phonetic points (pronunciation of letters and stress patterns) in the
- 27 -

contents of components of the 1st term, Test 1, 2 and 3 cover only one main phonetic point
– pronunciation of letters.

The phonetic test items of the 2nd term test samples from Test 6 to Test 10 have very
low content validity. Test 7 and Test 8 cover old knowledge of the 1st term (pronunciation
of letters and stress patterns) while Test 6, Test 9 and Test 10 examine pronunciation only.
None of them test any of the 2nd term contents of components.
It may be explained that it isn‟t very easy to design phonetic questions about full and
contracted forms of auxiliaries; rhythm or elision as the 2nd term contents of components.
Another reason that may be more persuasive is, the test makers followed the format of the
High School Graduation Examination and College/University Entrance Exams in English,
which mainly test the pronunciation of letters and stress patterns

To conclude, 20% of tests‟ phonetic items passed the content validity.

3.3.1.2. Content validity of grammar test items

Generally, the data of the percent of grammar items shows that the content validity
of grammar items of the 1 st term tests are much higher than the ones of the 2nd term tests.
That can be explained as most of the major grammar points of the components‟ contents of
the 1st term tests appear in Test 1, 2, 3, 4, and 5. Especially, Test 4 and Test 5 cover all
major grammar points of the components‟ contents of the 1st term tests.

On the other hand, when comparing the components‟ contents of the 2nd term tests
with grammar test items of 2nd term test samples 6, 7, 8, 9 and 10, the author concludes that
their content validity is low. Test 6 has the most numbers of grammar items of all with
45% get content validity. This can be understood that most of test-makers didn‟t pay much
attention to the test specification while designing the tests. And this leaded to the fact that
main grammar points in textbook didn‟t appear in the test, for instance, modals in the
passive voice, transitive and intransitive verbs, or phrasal verbs, etc. There is too much old
grammar knowledge in those tests: approximately 55% in Test 6, 70% in Test 7, 80% in
Test 8, 85% in Test 9, and 60% in Test 10. Moreover, much of typing errors appeared:
question 14 in Test 7 (His He), question 37 in Test 9 (lack of question mark), question
18 in Test 10 (lack of „a‟ after „such‟).

To sum up, 5 of 10 (50%) of test samples‟ grammar items achieved content validity.
- 28 -

3.3.1.3. Content validity of vocabulary items

The statistics of vocabulary items show that 70% of the tests‟ vocabulary items
possess the content validity because the topics of vocabulary item in test samples are
similar to the components‟ contents of 1 st and 2nd term tests. However, we noticed that the
topics of the reading passages of Test 2 and Test 3 are not suitable with the test
specifications‟ topics. While the reading passage of Test 2 is about a famous writer (Jack
London), the reading passage of Test 3 is about robots. The reading passages of Test 8 the
topics of which were history of movies and reading methods, didn‟t follow the topic list of
the 2nd term test components‟ contents, and we can say that its 50% vocabulary items in
reading passages did not have content validity.

In conclusion, 7 of 10 (70%) test samples‟ vocabulary items have passed the content
validity. The rest should be corrected to possess content validity.

3.3.2. Construct validity of the test samples

The construct validity of the test samples was investigated by analyzing the
components of test samples (phonetics, grammar and vocabulary) to see how valid the
techniques are used to test those components. The following table shows out the
shortcomings of some test samples‟ components as regards to their construct validity.

Test samples Phonetics items Grammar items Vocabulary items


Test 1 100% 90% 100%
Test 2 100% 95% 50%
Test 3 100% 80% 50%
Test 4 100% 100% 100%
Test 5 100% 100% 100%
Test 6 100% 45% 100%
Test 7 100% 30% 100%
Test 8 100% 25% 50%
Test 9 100% 25% 100%
Test 10 100% 40% 100%

Table 6 . Construct validity of the test samples


- 29 -

3.3.2.1. Construct validity of phonetic test items

It can be stated that in terms of construct validity, the phonetic test items of those
tests appear to be valid. That is because the multiple choice questions can check students‟
ability of pronunciation of vowels and recognition of stress patterns.

The following example is from Test 1:

Choose the word whose underlined part is pronounced differently from the rest

1. A. summer B. include C. instruction D. compulsory


2. A. increased B. asked C. wanted D. impressed
3. A. smile B. maximum C. limit D. deliver
4. A. movie B. lose C. women D. prove
5. A. cracker B. spacious C. classic D. gladder
The techniques that was employed to test this element of language component
proved that they ensured to evaluate the students‟ ability of recognizing the pronunciation
of the ending “ed”, the learnt vowels.

Another example from Test 5 proved the students‟ ability to distinguish the stress
patterns of multi syllable words:

Choose one word which has different stress pattern from the others. Identify your
answer by circling the corresponding letter A, B, C or D
Câu 4: A. recently B. probably C. immediately D. usually
Câu 5: A. musician B. politician C. violinist D. librarian
It can be seen from Test 6 to Test 10 that the phonetic test items of those test
samples are valid in terms of construct validity. Like the 1st term test samples, the 2nd term
ones are able to test the students‟ ability of recognizing the pronunciation of vowel sounds
the ability to distinguish the stress patterns of multi syllable words (except for Test 6 which
test students‟ ability of pronunciation only). It may be explained that the test makers
followed the format of the High School Graduation Examination and College/University
Entrance Exams in English, which test these above students‟ ability only.

In conclusion, 100% of phonetics items of all the test samples possess construct
validity.
- 30 -

3.3.2.2. Construct validity of grammar test items

The author noticed that in terms of construct validity, generally most of the grammar
test items from Test 1 to Test 5 were valid as most of them employed more than 2
techniques that could measure students‟ ability of understanding and using tenses and
structures, articles, etc. However, several grammar test items failed to evaluate the
students‟ ability of remembering tenses and structures as only multiple choices and error
recognition techniques are used. It is realizable to add transformation or word formation
technique to evaluate the construct validity of grammar items of 1 st term test samples.

In the following examples, the 2 grammar questions failed to test students‟ ability to
use the direct and indirect speech correctly. Students may not know how to change a direct
speech into indirect speech but they might still take the right choices in a random way:
Test 5: Choose the correct sentence with the same meaning as the one in italics
Câu 37: "I've arranged to meet them after lunch tomorrow," Mathew said.
A. Mathew said that he arranged to meet them after lunch the next day.
B. Mathew said that he had arranged to meet them after lunch tomorrow.
C. Mathew said that he had arranged to meet them after lunch the next day.
D. Mathew said that he has arranged to meet them after lunch the next day.
Câu 38: The teacher asked her class: "Do you remember what you have to do now?"
A. The teacher asked her class if we remembered what we had to do then.
B. The teacher asked her class if they remembered what they had to do then.
C. The teacher asked her class if they remember what they have to do then.
D. The teacher asked her class if they remember what they had to do now.

The problem in the following grammar sentence is that we could choose option A or
C to be changed. If option A is changed, it will be “has passed” and if option C is chosen,
then it is “be allowed”.
Test 3: Part IV: Writing

Choose the underlined word(s) that must be changed in order to make each of the
following sentences correct by circling the corresponding letter A, B, C or D.
37. If he passed the GCSE examination, he would have been allowed to take the
A B C
entrance examination to the university.
D
- 31 -

In another more example, the test designer should used the symbol O to replace
option D “no article” as it will take students‟ time to think of the meaning of the word “no
article”.
Test 3: Part II: Grammar and Vocabulary (3.75 points)

Choose from the four options given one best answer to complete each sentence by
circling the corresponding letter A, B, C or D.
11. If you don‟t send the application form on time, you will not be able to take …..
entrance examination.
A. the B. a C. an D. no article
Or in the following case, the author thinks that this sentence is too easy for students
of grade 12 as it was taught for grade 6 students, so this sentence is invalid to check
students‟ language proficiency:
Test 4:
Choose the best answer among A, B, C or D that best complete for each sentence:
13. Tom: “How are you today, Jane?” - Jane: “…….”
A. I am 20 B. good, I like it C. fine, thanks D. No, thank you

The author noticed that the popular technique, multiple choice questions in Test 7,
Test 9 and Test 10 failed to evaluate the students‟ understanding of using indirect speech
questions. For example, question 19, 46 in Test 10; question 45 in Test 7; question 13 in
Test 9. Another thing to say is, none of those test samples used sentence transformation but
multiple choice questions instead to test students‟ usage of tenses and structures.
Moreover, the grammar items 46, 47, 48, 49, 50 of Test 7 cannot measure students‟ ability
of writing sentences from cue words. Grammar item 25 of Test 10 failed to check students‟
ability of understanding and using modals in passive voices because it also check students‟
cultural knowledge. Grammar items 47, 48, 49 and 50 of Test 10 cannot examine students‟
ability of sentence transformation and the understanding and using relative clauses;
Grammar item 47 of Test 9 failed to examine students‟ understanding and using the
passive voices; question 6, 16, 22, 23, 26, 28 of Test 8 cannot evaluate the students‟ ability
of sentence transformation and sentence combination.
- 32 -

All test samples used error recognition technique to test grammar structures, and
they are quite valid as they can test students‟ ability to use and recognize the grammar
structures.
In short, Test 2, Test 3 and Test 6 that are 30% of all the test samples are valid in
terms of construct validity while the rest (70%) should be edited to possess construct
validity.
3.7.1.3. Construct validity of vocabulary test items
It can be said that the vocabulary test items from Test 1 to Test 5 possess construct
validity. However, the contexts of some questions are not very clear to decide which option
is the best choice. For example:
Test 1: II. Choose the best answer A, B, C or D to complete the sentences
14. She felt……because she couldn‟t answer the last question
A. sadness B. embarrassing C. disappointed D. shameful
In this case, B, C, or D might be the acceptable answers for this question. For us, this
sentence must be changed to be correct as follows:
14. She felt……because she couldn‟t answer the last question
A. sadness B. happy C. disappointed D. satisfied
However, vocabulary items 26, 27, 28, 29, 30 of Test 1 are able to check the
students‟ ability of understanding and using the word formation; vocabulary items from 31
to 40 help to check students‟ ability of reading comprehension. They are all said to possess
construct validity.
Vocabulary questions from Test 6 to Test 10 are also valid in terms of construct
validity. Many techniques were used to test the students‟ understanding and using new
words or phrases. Some of them are multiple choice questions, word-formation and gap-
fill. However as the result of analysis the author found out that all of those tests used
completely multiple choice questions and error recognition to test students‟ ability of using
vocabulary. It is better to test students‟ ability of understanding and using vocabulary by
word-formation technique. The gap-fill technique for passages which is used in Test 6 and
Test 10 seems good, but it is better if its multiple choice questions are replaced by
matching.

To sum up, 100% of the tests‟ vocabulary items passed the construct validity.
- 33 -

CHAPTER 4: CONCLUSION

In this part, the author will conclude the study. Some suggested implications for the
study will also be given. Last but not least, limitations and suggestions for further studies
will be presented.
4.1. Conclusion
This study has made clear the basic notions of testing and evaluating in education,
focusing on construct validity and content validity of the tests. Through the efforts of the
test samples‟ analysis, the answers for the two research questions were found out.
The 1st research question is answered as follows: the statistics showed that the
content validity of the test samples‟ components is very low. Only 20% of phonetic items
(Test 4 and Test 5) and 50% of grammar items (Test 1,2,3,4,5) have attained content
validity while the number of vocabulary questions that got the content validity is 80%
(except for Test 2 and Test 3). It is one more time proved for the author‟s idea that the
contents of the tests didn‟t go with the syllabus design of the textbook. Therefore, the
answer for research question 2 is: only Test 4 and Test 5 that took 20% of test samples
have possessed content validity.
Secondly, the answer for 2nd research question is as follows: from the analysis and
comparison between the syllabus design and test specifications with the 10 collected test
samples, it was found out that 100% of vocabulary test items of all the test samples have
construct validity; 100% of the phonetic items also possessed construct validity. Only 30%
of the grammar items did not achieved construct validity as they failed to check students‟
ability of understanding and using tenses and difficult structures, while 70% of the rest are
valid. It can be concluded that all test samples (except for Test 7, Test 9 and Test 10) have
possessed the construct validity.
4.2. Implications
From these findings, the following recommendations were given to improve the
tests‟ construct validity and content validity of the test samples:
- To improve the construct validity of the test, at least two or three testing techniques
should be employed to measure pupils‟ ability of using and producing sounds, grammar
structures and vocabulary. For example, to test grammar structures, multiple choices
should be used with word formation technique or other alternatives as sentence
- 34 -

transformation and sentence building. Both objective tests and essay tests have their own
advantages and problems. However, the combination between these two approaches could
help to give the more accurate and effective assessment of students‟ ability.
- To improve the content validity of the tests, there should be common test
specifications (or test formats) which could reflect the main points of syllabus design and
issued by educational governors because these are final achievement tests. Besides,
advanced knowledge (about 10-30%) should be presented to help students prepare for later
examinations.
- As can be seen in the objectives of the English Grade 12, at the end of this grade,
students are able to use learnt English knowledge to practice the four skills: Listening,
speaking, reading comprehension and writing. However the present achievement tests
seem to use the old test format that test mainly grammar and vocabulary components.
Moreover, there is no test sample that checks students‟ ability of listening and speaking
skills. Our recommendation here is there should be test items for listening and speaking
skills to assess what students have learnt.
- When designing a test in general, test- makers should be a detailed specification,
basing on the objectives and syllabus contents of the course, that raises clear and detail
requirements. For example, what contents of phonetics, grammar structures and vocabulary
must be obligated in. In accordance with those requirements, a marking scale should be
built clearer and fairer for students, especially for tests of production skills like speaking or
writing tests. The author thinks that these detailed specifications and marking scales not
only necessary to teachers in high schools but essential for every level of education.
4.3. Limitations and suggestions for further research
In spite of the author‟s efforts to carry out this study, there are unavoidable
limitations due to the volume of a minor thesis and the writer‟s own knowledge and ability.
Firstly, only construct validity and content validity were investigated while many other
important criteria of a test‟s quality were not evaluated. For instance, reliability, face
validity, backwash effect, practicality, etc. Secondly, the major instrument to study the
tests construct validity is only testing techniques, while some other instruments like levels
of difficulty, intercorrelation of tests‟ components, etc. were not mentioned. Lastly, only
test samples were collected for this study that caused the lack of reliability and face
validity.
- 35 -

From these above limitations, suggestions for further research are shown as follows:
- More of instruments for study should be used to enhance the accuracy of the
assessments of tests‟ components, for example, teachers‟ and students‟ questionnaires
about the tests, test scores, etc.
- Other essential criteria of these tests, such as reliability, face validity, backwash,
etc. should be under investigation.
- The capacity of test-designers and the relationship between the fact of teaching and
testing nowadays.
Last but not least, this research reflects the own ideas only of the writer basing on his
teaching experiences and theoretical collections, so the results of which must be analysed
and evaluated carefully by test designers and testing experts nationwide. Nevertheless, we
hope that this study will be the valuable source of reference for those who concerned with
the achievement test design for English grade 12. It is also hoped that this minor thesis will
contribute to the improvement of language testing at High Schools in the North of
Vietnam.
- 36 -

REFERENCES
In Vietnamese:
1. Bộ Giáo Dục và Đào Tạo (2006). Chương trình giáo dục phổ thông - Trung học phổ
thông- môn Tiếng Anh. NXB Giáo Dục
2. Hoàng, Văn Vân (2008). Hướng dẫn thực hiện chương trình, sách giáo khoa lớp 12 môn
tiếng Anh. NXB Giáo dục.
3. Hoàng, Văn Vân (Tổng chủ biên) (2007). Tiếng Anh 12. NXB Giáo Dục
4. Vũ, Thị Lợi (chủ biên) (2008). Kiểm tra đánh giá thường xuyên và định kỳ môn tiếng
Anh lớp 12. NXB Giáo Dục

In English:
5. Alderson, C.J, Clapham. C and Wall. D (1995). Language test construction and
evaluation. Cambrige University Press
6. Anastasi, A (1982). Psycological testing. London. Macmillan
7. Batchman, L.F (1990). Fundamental considerations in language testing. Oxford
University Press
8. Bachman, L.F and Palmer, A.D (1996). Language testing in practice. Oxford University
Press
9. Cohen, A.D (1981) Second Language Testing. (in) Teaching English as a second foreign
language. Celce-Murcia, M. (ed). Boston, Massachusetts. Heinle and Heinle Publishers
10. Đậu, Duy Lịch (2007). A study on the improvement of English speaking skills for 10th
form students at upper-secondary schools in Ha Tinh province. MA thesis-
methodology
11. Davies, A et al., (1999). Dictionary of language testing. University of Melbourne
12. Fulcher, G (2003). Testing Second Language Speaking. Peason Education Limited
13. Harold S. Madsen (1983). Techniques in testing. Oxford University Press
14. Harris, D.P (1969). Testing English as a second language. New York: McGra, Hill
Book Company
15. Heaton, J.B (1997). Classroom Testing. Longman
16. Heaton, J.B (1988). Writing English language tests. London. Longman
17. Henning, G (1987). A guide to language testing. Cambrige: Newbury House Publishers
- 37 -

18. Hughes, A (1995). Testing for language teachers. Cambrige University Press
19. Lâm, Thị Thu Thủy (2008). The application of games in teaching grammar with
reference to Tiếng Anh 10 Texbook at Hà Trung high school, Thanh Hóa province. MA
thesis-Methodology
20. Lê, Thùy Linh (2004). Teachers’ and test-takers’ evaluations of the validity of the
current final test for the 4th semester non-english majors at Hanoi University of
Education. MA thesis-Methodology
21. Nguyễn, Thị Mai Phương (2008). Validity of the achievement test for non-major,
second-year students at Economics Department, Hanoi Open University. MA thesis-
Methodology
22. Nguyễn, Thị Nguyệt (2007). How to make classroom reading more communicative for
grade 10 of English at Bac Ninh specialized high School. MA thesis-Methodology
23. Nguyễn, Thị Bích Hồng (2008). Evaluating an achievement test for credit 4 to non-
majors at Vietnam University of Commerce and some suggestions for improvement.
MA thesis-Methodology
24. Lado, R (1961). Language Testing. London. Longman
25. Mcnamara, T (2000). Language Testing . Oxford University Press
26. Nunan, D (1992). Qualitative and quantitative researches. Research methods in
Language Learning. Cambrige University Press
27. Vũ, Ba Linh (2006). An evaluation of the current 5th semester test for students at
Economics Faculty, HNU. MA thesis-Methodology
28. Trần, Thị Hiếu Thủy (2008). Evaluation of an end term listening test for first year
mainstream students of English Department - College of Foreign Languages - Vietnam
National University, Hanoi. MA thesis-Methodology
29. Weir, C.J (1990). Comunicative Language Testing. Prentice Hall International Ltd, UK
30. Weir C.J (2005). Language Testing and Evaluation. Palgrave Macmillan
- 38 -

APPENDICES

THE TEST SAMPLES

You might also like