Materials Evaluation

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/349485391
How to evaluate language teaching materials?
Presentation · February 2021

DOI: 10.13140/RG.2.2.12223.43685
CITATIONS READS
0 7,382
3 authors, including:
Farangis Shahidzade
Yazd University
103 PUBLICATIONS 42 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Show Your Achievements 😎 View project
The role of narrative designs in language teaching /learning research View project
All content following this page was uploaded by Farangis Shahidzade on 21 February 2021.
The user has requested enhancement of the downloaded file.

Materials Evaluation
Presented by Mohammad Reza Jafari and

Mohammad Reza Khorshidi
Under the Supervision of Dr. Farangis Shahidzade

An Overview of the Presentation
 What is materials evaluation?
 Why evaluate materials?
 Evaluation vs. analysis
 Principles in materials evaluation
 What is being evaluated?
 Types of materials evaluation
a) External
b) Internal
 When is the evaluation carried out?
a) Pre-use
b) Whilst-use
c) Post-use
 How is the evaluation carried out?
 Who carries out the evaluation?
a) Teachers
b) Publishers
c) Learners
d) Specialists/practitioners
 Developing criteria for materials evaluation

What Is Materials Evaluation?
Materials evaluation is a procedure that involves measuring the

(potential) value of a set of learning materials (Tomlinson, 2003c).
It involves making judgements about the effect of the materials on

the people using them.
Evaluation tries to measure some or all of the following:
 the appeal of the materials to the learners,
 the credibility of the materials to learners, teachers and

administrators,
 the validity of the materials (i.e. Is what they teach worth

teaching?),
 the reliability of the materials (i.e. Would they have the same
effect with different groups of target learners?).
 the ability of the materials to interest the learners and the
teachers,
 the ability of the materials to motivate the learners,
 the flexibility of the materials (e.g. the extent to which it is

easy for a teacher to adapt the materials to suit a particular
context).
Why Evaluate Materials?
According to Sheldon (1988), there are three basic reasons to
evaluate coursebooks:
1) By evaluating materials, the teacher/program developer can

make decisions on selecting the appropriate coursebook.
2) Moreover, evaluation can familiarize the teacher with the

potential weaknesses and strengths of materials.
3) Material evaluation can be another way of action research

developing our understanding of the functions of materials.
“The primary function of evaluation is to assess the suitability of
materials for a given teaching and learning context” (Mishan &
Timmis, 2015, p. 66).
It is important to keep in mind that no two evaluations can be the

same because the needs, objectives, backgrounds (e.g.,
sociocultural/local) and preferred styles of the participants will
differ from context to context (Tomlinson, 2003c).
(New York vs. Tehran)
Therefore, the main point is that it is not the materials which are
being evaluated but their effects on the people who are using
them (including the evaluators, too).
Evaluation vs. Analysis
An evaluation can include an analysis or follow from one, but the
objectives and procedures of it are different (Tomlinson, 2003c).
An evaluation focuses on the users of the materials and makes

judgements about their effects and no matter how structured,
criterion referenced and rigorous an evaluation is, it will be
inevitably subjective (Tomlinson, 2003b).
On the other hand, an analysis focuses on the materials and it

aims to provide an objective analysis of them. It “asks questions
about what the materials contain, what they aim to achieve and
what they ask learners to do” (Tomlinson, 1999, p. 10).
Byrd (2001) makes a rather different distinction between
evaluation and analysis when she talks about “evaluation” for
selection and “analysis” for implementation.
Even a review for a publisher or journal, and an evaluation for a

ministry of education is often “fundamentally a subjective, rule
of thumb activity” (Sheldon, 1988, p. 245).
How to eliminate/remove subjectivity?
Making an evaluation criterion-referenced can reduce (but not

remove) subjectivity and can certainly help to make an evaluation
more principled, rigorous, systematic and reliable (Tomlinson,
2003c).
Examples of Analysis
Questions
• Does it provide a transcript of the listening texts? (YES/NO)
• What does it ask the learners to do immediately after reading

a text?
Example of Evaluation
Question
• Are the listening texts likely to engage the learner? (very

unlikely/very likely)
Ideally, analysis is objective but analysts are often influenced by
their own ideology and their questions are biased accordingly
(Tomlinson, 2003c).
• For example, in the question “Does it provide a lot of guided

practice?”, the phrase “a lot of” implies it should do and this
could interfere with an objective analysis of the materials.
Analysts also often have a hidden agenda when designing their

instruments of analysis (Tomlinson, 2003c).
• For example, an analyst might ask the question “Are the

dialogues authentic?” in order to provide data to support an
argument that intermediate coursebooks do not help to
prepare learners for the realities of conversation.
According to Tomlinson (2003c), unfortunately there have been
cases where “many publications on materials evaluation mix
analysis and evaluation” (p. 23).
For example, Cunningsworth (1984, pp. 74–9) includes both

analysis and evaluation questions in his “Checklist of Evaluation
Criteria.”
Tomlinson’s preference for separating analysis from evaluation is

also shared by Littlejohn (2011), who presents a general
framework for analyzing materials (pp. 182–98), and the
framework is as follows:
1) Analysis of the target situation of use,
2) Materials analysis,
3) Match and evaluation (determining the appropriacy

of the materials to the target situation of use, or
the congruence between the target situation of use
and the materials),
4) Action.
Principles in Materials Evaluation
Many evaluations are impressionistic, or at best are aided by an ad-hoc

and very subjective list of criteria (Tomlinson, 2003c).
In Tomlinson’s view (2003), it is very important that evaluations are

driven by a set of principles and that these principles are articulated by
the evaluator(s) prior to the evaluation.
In this way greater validity and reliability can be achieved and fewer
mistakes are likely to be made.
 In developing a set of principles it is useful to consider the following.

The Evaluator’s Theory of Learning and Teaching
All teachers develop theories of learning and teaching which they apply in
their classrooms.
Many researchers (e.g. Schon, 1983) argue that it is useful for teachers to
try to achieve an articulation of their theories by reflecting on their
practice.
 Edge and Wharton (1998, p. 297) argue that reflective practice cannot
only lead to “perceived improvements in practice but, more importantly,
to deeper understandings of the area investigated.”
In a similar way, Tomlinson (2003b) argues that the starting point of any
evaluation should be reflection on the evaluator’s practice leading to
articulation of the evaluator’s theories of learning and teaching.
In this way, evaluators can
a) make overt their predispositions,
b) make use of them in constructing criteria for evaluation,
c) be careful not to let them weight the evaluation too much towards
their own bias and
d) learn a lot about themselves and about the learning and teaching
process.
WHAT IS BEING EVALUATED?
 According to Mishan and Timmis (2015), the assumption in
materials evaluation is that evaluation is applied to
coursebooks.
 This is actually the most common form of evaluation.
 “Among the materials which could usefully be evaluated are,

for example, in-house materials, tests, graded readers or self-
access materials” (Mishan & Timmis, 2015, p. 58).
 Tasks can also be evaluated (Ellis, 2011).

Ellis (2011) informs us about “micro-evaluation,” which, in the
words of Mishan and Timmis (2015), “involves researching in
detail the effects of particular classroom tasks included in the
materials” (p. 58).
Ellis (2011: 231) argues that “[micro-evaluation] forces a teacher

to examine the assumptions that lie behind the design of a task
and the procedures used to implement it. It requires them to go
beyond impressionistic evaluation by examining empirically
whether a task “works” in the way it intended and how it can be
improved for future use.”
Types of Materials
Evaluation
Evaluations differ in
• purpose: doing an evaluation to
1) help a publisher to make decisions about publication,
2) to help yourself in developing materials for publication,
3) to select a textbook,
4) to write a review for a journal or as part of a research

project.
• personnel: as an evaluator you might be
1) a learner,
2) a teacher,
3) an editor,
4) a researcher,
5) a Director of Studies or an Inspector of English.

• formality: you might be
1) doing a mental evaluation in a bookshop,
2) filling in a short questionnaire in class or
3) doing a rigorous empirical analysis of data elicited from a

large sample of users of the materials.
• timing: you might be doing your evaluation
1) before the materials are used,
2) while they are being used or
3) after they have been used.

WHEN IS THE EVALUATION
CARRIED OUT?
In terms of the period the evaluation is
carried out, Cunningsworth (1995)
proposes pre-use, in-use and post-use
evaluations:
1) Pre-use Evaluation: it is intended to predict the potential
performance of a material (predictive).
2) In-use Evaluation: it is conducted while using a coursebook

“when a newly introduced coursebook is being monitored or
when a well-established but ageing coursebook is being
assessed to see whether it should be considered for
replacement” (Cunningsworth, 1995, p. 14).
3) Post-use Evaluation: it provides retrospective assessment of a

material and it is also used to decide whether to use the same
material on future occasions or not.
Tomlinson’s View
The obvious choice in terms of when the evaluation takes place is
sth between pre-use, whilst-use and post-use evaluation
(Tomlinson, 2003b).
However, it is important to keep in mind that almost all these

kinds of evaluations will be preceded by a detailed analysis of the
context in which the materials will be used (McGrath, 2002).
Therefore, it can be concluded that any evaluation presupposes

an already-done analysis (Tomlinson, 2003c).
Pre-use Evaluation
Pre-use evaluation involves making predictions about the
potential value of materials for their users (Tomlinson, 2003c).
It is often done impressionistically.
Example: when a teacher flicks through a book to gain a quick

impression of its potential value.
• For this reason, publishers are well aware of this procedure

and sometimes place attractive illustrations in the top right-
hand corner of the right-hand page in order to influence the
flicker in a positive way (Tomlinson, 2003c).
According to Mishan and Timmis (2015), pre-use evaluations are
more common than whilst-use and post-use evaluations for two
reasons (p. 59):
1) They are usually designed to inform us which materials to

adopt,
2) They are easier to carry out than whilst-use or post-use

evaluations.
Whilst-use Evaluation
This involves measuring the value of materials while using them

or while observing them being used (Tomlinson, 2003c).
It can be more objective and reliable than pre-use evaluation as it

makes use of measurement rather than prediction (Tomlinson,
2003c).
However, it is limited to measuring what is observable (e.g. Are

the instructions clear to the learners?) and cannot claim to
measure what is happening in the learners’ brains.
We should take into account the following
while conducting a whilst-use evaluation:
• Clarity of instructions
• Clarity of layout
• Comprehensibility of texts
• Credibility of tasks
• Achievability of tasks
• Achievement of performance objectives
• Potential for localization

• Practicality of the materials
• Teachability of the materials
• Flexibility of the materials
• Appeal of the materials
• Motivating power of the materials
• Impact of the materials
• Effectiveness in facilitating short-term learning

Therefore, whilst-use evaluation can be very useful but dangerous
because teachers and observers can be misled by whether the
activities seem to work or not (Tomlinson, 2003c).
According to Tomlinson (2003b), an evaluator can easily be

deceived by activities which appear to work well.
For example, lessons which generate “student talking time” are

often rated highly, but we need to evaluate the quality of the
talk, not just the quantity (Mishan & Timmis, 2015).
Tomlinson (2003b) argues that most of the whilst-use evaluation
aspects can be assessed impressionistically through observation,
though he advises that it is preferable to focus on one aspect per
observation.
In other words, greater reliability can be achieved by focusing on

one criterion at a time and also by using pre-prepared
instruments of measurement.
• For example, “Appeal of the materials.”
• Are the materials appealing to the learners?

McGrath (2002: 120) focuses specifically on the role of the
teacher in both whilst-use and post-use evaluations.
Teachers can, he argues, ask themselves questions of the

following kind as prompts for whilst-use and post-use evaluations:
• What proportion of the materials was I able to use unchanged?
• Did the unchanged materials appear to work well? What

evidence do I have for this?
• What spontaneous changes did I make as I taught with the

materials? Did these improvisations work well? If not, what do I
need to do differently?
Post-use Evaluation
Post-use evaluation is probably the most valuable/informative
type of evaluation because it can measure the actual effects of
the materials on the users (Tomlinson, 2003c).
It can measure:
1) the short-term effects such as motivation, impact,

achievability, instant learning, etc., and
2) the long-term effects like durable learning and application.

According to Mishan and Timmis (2015), while pre-use evaluation
has an important role in predicting poor selection of materials or
selection of poor materials, post-use evaluation is potentially the
most informative type.
McGrath (2013) also believes that retrospective evaluation (post-

use) can lead to the identification of weaknesses in the materials,
thereby leading to constructive revision and adaptation.
We should take into account the following
while conducting a post-use evaluation:
• What do the learners know which they did not know before
starting to use the materials?
• What do the learners still not know despite using the

materials?
• What can the learners do which they could not do before

starting to use the materials?
• What can the learners still not do despite using the materials?
• To what extent have the materials prepared the learners for

their examinations?
• To what extent have the materials prepared the learners for
their post-course use of the target language?
• What effect have the materials had on the confidence of the

learners?
• What effect have the materials had on the motivation of the

learners?
• To what extent have the materials helped the learners to

become independent learners? (autonomy)
• Did the teachers find the materials easy to use?
• Did the materials help the teachers to cover the syllabus?
• Did the administrators find the materials helped them to

standardize the teaching in their institution?
In other words, by conducting a post-use evaluation, one can:
1) measure the actual outcomes of the use of the materials, and
2) provide the data in order to make reliable decisions about the

use, adaptation or replacement of the materials.
How to measure the post-use effects

of materials:
 tests of what has been “taught” by the materials,
 tests of what the students can do (direct testing),
 examinations,
 interviews,
 questionnaires,
 criterion-referenced evaluations by the users,
 post-course diaries,
 post-course “shadowing” of the learners,
 post-course reports on the learners by employers, subject

tutors, etc.
HOW IS THE EVALUATION
CARRIED OUT?
Evaluation criteria:
The most important factor in the design of an evaluation

instrument should be the criteria against which the materials are
evaluated (Tomlinson, 2003b).
Generating evaluation criteria:
1) The first succinct evaluative approach/framework is called

CATALYST test introduced by Grant (1987).
It stands for Communicative, Aims, Teachability, Available add-

ons, Level, Your impression, Students interest and Tried and
tested.
2) The second is Tanner and Green’s practical assessment form
(1998) based on Method, Appearance, Teacher-friendliness,
Extras, Realism, Interestingness, Affordability, Level and Skills,
the initials of which recollectively make up the word MATERIALS.
3) The third framework is that of Tomlinson (2003b); he suggests
five categories of evaluation criteria, each of which can be used
to develop a number of specific criteria:
 universal (driven by SLA theory): e.g. are the materials

motivating?
 local (related to the context): e.g. are the materials culturally

acceptable in the context?
 media-specific (e.g. audio or computer): e.g. is the sound
quality of the audio materials good?
 content-specific (e.g. exam or English for Specific Purposes

(ESP)): e.g. do the materials replicate the types of real-world
tasks the target group will need to do? (content validity)
 age-specific: e.g. are the visuals likely to appeal to children?

4) The fourth is that of Rubdy (2003); he argues that evaluation
criteria can be generated from three key notions:
a) psychological validity: learners’ needs, goals and pedagogical

requirements (like independence, autonomy, self-
development and creativity).
b) pedagogical validity: teachers’ skills, abilities, theories and

beliefs (like guidance, choice and reflection).
c) process validity (and content validity): the thinking underlying

the materials, writer’s presentation of the content and
approach to teaching and learning respectively (methodology,
content, layout and graphics).
5) The fifth is the framework proposed by Riazi (2003) which
consists of surveying the teaching/learning situation, conducting a
neutral analysis and the carrying out of a belief-driven
evaluation.
a) surveying the teaching/learning situation,
b) conducting a neutral analysis and
c) the carrying out of a belief-driven evaluation.

6) The sixth is that of Mukundan (2006), who describes the use of
a composite framework combining checklists, reflective journals
and computer software to evaluate ELT textbooks in Malaysia.
a) checklists,
b) reflective journals and
c) computer software.
7) The seventh framework has been proposed by McDonough,
Shaw and Masuhara (2013), who focus on developing criteria
evaluating the suitability of materials in relation to usability,
generalizability, adaptability and flexibility.
Evaluating the suitability of materials in relation to:
a) usability,
b) generalizability,
c) adaptability and
d) flexibility.
8) The eighth framework is that of McGrath (2002), who suggests
a procedure involving materials analysis followed by first glance
evaluation, user feedback and evaluation using context-specific
checklists.
a) Materials analysis,
b) First glance evaluation,
c) User feedback and
d) (Final) evaluation using context-specific checklists.

McGrath (2002) notes the following areas which are common to
most of the frameworks:
• design: includes both layout of material on the page and

overall clarity of organization
• language content: coverage of linguistic items and language

skills
• subject matter: topics
• practical considerations: e.g. availability, durability and price

Making use of a checklist of criteria has become popular in
materials evaluations and certain checklists from the literature have
been frequently made use of in evaluations (Tomlinson, 2003c).
For example, the famous checklist by Demir & Ertas (2014), which
consists of these four main sections:
 Subjects & Contents (10 items),
 Skills & Sub-skills (25 items),
 Layout & Physical Make-up (7 items) and,
 Practical Considerations (14 items).
(56 items overall)

Problems of evaluation criteria/checklists:
• The problem is that no criteria can be applicable to all

situations and it is also important that there be a congruence
between the materials and the curriculum, students and
teachers (Byrd, 2001).
• Mathews (1985), Cunningsworth (1995) and Tomlinson (2012)

have also stressed the importance of relating evaluation
criteria to what is known about the context of learning,
meaning the criteria should be consonant with the context of
learning.
Makundan and Ahour (2010) in their review of 48 evaluation
checklists were critical of most checklists for being too context
bound to be generalizable, that is, the criteria were too much
context-specific.
Instead, Makundan and Ahour (2010) proposed that a framework

for generating flexible criteria would be more useful than
detailed and inflexible checklists. Moreover, more attention
should be given to retrospective evaluation than to predictive
evaluation.
It means instead of using checklists, each practitioner can utilize

or probably come up with a certain set of evaluation criteria.
Tomlinson and Masuhara (2004, p. 7) proposed the following
criteria for evaluating, monitoring and revising the criteria they
have generated:
a) Is each question an evaluation question?
b) Does each question only ask one question?
c) Is each question answerable?
d) Is each question free of dogma?
e) Is each question reliable in the sense that other evaluators

would interpret it in the same way?
Tomlinson (2003b) also suggests a set of questions which could be
used more generally to monitor evaluation criteria in any
evaluation framework:
• Is the list based on a coherent set of principles of language

learning?
• Are all the criteria actually evaluation criteria?
• Are the criteria sufficient to help the evaluator to reach useful

conclusions?
• Are the criteria organized systematically (for example into
categories and subcategories which facilitate discrete as well
as global verdicts and decisions)?
• Are the criteria sufficiently neutral to allow evaluators with

different ideologies to make use of them?
• Is the list sufficiently flexible to allow it to be made use of by

different evaluators in different circumstances?
Other Ways to Evaluate
Materials
Regarding different methods to evaluate coursebooks,

Abdelwahab (2013) suggests three basic methods:
a) The impressionistic method: involves analyzing a
coursebook based on the general impression. This method will
not be adequate in itself.
b) The checklist method: needs to be integrated with the

impressionistic method so that the impressionistic method
will not be inadequate.
c) The in-depth method: has to do with a profound scrutiny

of representative features such as the design of one particular
unit or exercise, or how particular language elements have
been treated (internal evaluation).
McDonough and Shaw (2003: 61) suggest that the evaluators
should first conduct an external evaluation “that offers a brief
overview from the outside” and then carry out “a closer and more
detailed internal evaluation.”
1) A brief external evaluation which should be conducted to

have an overview of the organizational foundation of the
material.
2) A detailed internal evaluation “to see how far the materials in

question match up to what the author claims as well as to the
aims and objectives of a given teaching program” (McDonough
& Shaw, 1993, p. 64).
The External Evaluation
In this model, the organization of the materials as stated

explicitly by the author/publisher should be examined by looking
at:
• the “blurb” or the claims made on the cover of the

teacher’s/students’ book
• the introduction and table of contents

This is actually what Tomlinson (2003c: 16) calls analysis in that
“it asks questions about what the materials contain, what they
aim to achieve and what they ask learners to do.”
At this stage, an evaluator should consider why the materials have

been produced. In other words, it should be made clear what the
purposes of the materials are.
From the “blurb” and the introduction, we can normally expect
comments on some/all of the following (McDonough, Shaw &
Masuhara, 2013, pp. 55-56):
 the intended audience (who the materials are targeted at)
 the proficiency level (false beginner, low intermediate, etc.)
 the context in which the materials are to be used (EFL, ESL,

ESP, EAP)
 how the language has been presented and organized into

teachable units/lessons (units/lessons/lengths)
 the author’s views on language and methodology and the

relationship between the language, the learning process and
the learner
Other factors to take into account at this external stage are as
follows:
 Are the materials to be used as the main “core” course or to

be supplementary to it?
 Is a teacher’s book in print and locally available?
 Is a vocabulary list/index included? (this is useful where the

learner might be doing a lot of individualized and/or out-of
class work)
 What visual material does the book contain (photographs,

charts, diagrams) and is it there for cosmetic value only or is it
integrated into the text?
 Is the layout and presentation clear or cluttered?

 The potential durability of the materials, paper quality and
binding need to be assessed.
 Is the material too culturally biased or specific?
 Do the materials represent minority groups and/or women in a

negative way? Do they present a “balanced” picture of a
particular country/society?
 What is the cost of the inclusion of digital materials (e.g. CD,

DVD, interactive games, quizzes and downloadable materials
from the web)? How essential are they to ensure language
acquisition and development?
 The inclusion of tests in the teaching materials (diagnostic,

progress, achievement); would they be useful for your
particular learners?
What Next?
If our external evaluation shows the materials to be potentially

appropriate and worthy of a more detailed inspection, then we
can continue with our internal or more detailed evaluation.
If not, then we can “exit” at this stage and start evaluating other
materials if we wish so.
1) Macro-evaluation 2) Inappropriate/appropriate 3) Micro-evaluation 4) Inappropriate/appropriate 5) Adopt/select
(External) (Internal)
exit exit
An overview of the materials evaluation process (McDonough, Shaw & Masuhara, 2013, p. 58).
The Internal Evaluation
Now we can continue to the next stage of our evaluation
procedure by performing an in-depth investigation into the
materials.
What is important at this stage is that we have to analyze the

extent to which the aforementioned factors stated in the external
evaluation stage match up with the internal consistency and
organization of the materials (McDonough, et al., 2013, p. 58).
Therefore, there should be a congruence between the claims of

the author/publisher (at the external evaluation stage) and what
the materials really include (at the internal evaluation stage).
In order to perform an effective internal evaluation of the
materials, we need to examine at least two units (preferably
more) of a book or set of materials to investigate the following
factors (McDonough, et al., 2013, pp. 59-60):
 the presentation of the skills in the materials (what skills are

covered, the proportion given to each skill, are the skills
treated in isolation (discretely) or integratively?)
 the grading and sequencing of the materials.
 where reading/discourse skills are involved, and is there much

in the way of appropriate text beyond the sentence?
 where listening skills are involved, are recordings “authentic”
or artificial?
 do speaking materials incorporate what we know about the

nature of real interaction or are artificial dialogues offered
instead?
 the relationship of tests and exercises to (a) learner needs and

(b) what is taught by the course material.
 do you feel that the material is suitable for different learning

styles? Is a claim and provision made for self-study and is such
a claim justified?
 Are the materials engaging to motivate both students and
teachers alike, or would you foresee a student/teacher
mismatch?
At this stage, it is also useful to consider how the materials may

guide and frame “teacher–learner interaction” and “the teacher–
learner relationship.”
The framework proposed by McDonough, et al. (2013) focuses on

evaluating the suitability of materials in relation to:
1) usability,
2) generalizability,
3) adaptability and
4) flexibility.
1) Usability Factor
• How far the materials could be integrated into a particular

syllabus as “core” or supplementary?
• For example, we may need to select materials that suit a

particular syllabus or set of objectives that we have to work to.
• The materials may or may not be able to do this.

2) Generalizability Factor
• Is there a restricted use of “core” features that make the

materials more generally useful?
• Perhaps not all the material will be useful for a given

individual or group but some parts might be.
• This factor can in turn lead us to consider the next point.

3) Adaptability Factor
• Can parts be added/extracted/used in another context/modified
for local circumstances?
• There may be some very good qualities in the materials but, for
example, we may judge the listening material or the reading
passages to be unsuitable and in need of modification.
• If we think that adaptation is feasible, we may choose to do this.

4) Flexibility Factor
• How rigid is the sequencing and grading? Can the materials be

entered at different points or used in different ways?
• In some cases, materials that are not so steeply graded offer a

measure of flexibility that permits them to be integrated
easily into various types of syllabus.
WHO CARRIES OUT THE
EVALUATION?
Principled evaluations based on explicit criteria can give the
impression that evaluation is, or should be, the exclusive domain
of specialists, which may not always be the case (Mishan &
Timmis, 2015, pp. 64-65).
We need to consider the role in evaluation of stakeholders who

are not specialists in this specific field:
a) teachers,
b) learners and
c) publishers.
Teachers as Evaluators
• Masuhara (2011) says meetings could be held where new
materials are presented to the teachers, leading to discussions
of which activities the teachers preferred and why they
preferred these activities to others.
• McGrath (2002: 120) proposes a number of questions teachers

can ask themselves to systematize whilst-use evaluation.
• Again McGrath (2013) suggests that teachers might keep

records of use, noting sections of the materials they had used
or omitted, which sections went well and so on.
Learners as Evaluators
According to McGrath (2013: 151), learners also have an
important role to play in evaluation: “learners are capable of
evaluation. They do not always opt for the same point on a scale.
They discriminate. Given the opportunity, they can make
judgements which may sometimes surprise their teachers.”
Examples of learners’ involvement in evaluation:
a) learner diaries,
b) rating of tasks,
c) pyramid discussions and
d) metaphor study.
Publishers as Evaluators
Amrani (2011) notes that publishers can use either (a) piloting or
(b) reviewing of materials to determine their suitability.
However, she points out that reviewing (comments on materials

made by stakeholders) is now more common than piloting as a
reviewing practice by publishers.
Standard Approaches to
A useful exercise for anybody writing or evaluating language
teaching materials would be to evaluate the checklists and
criteria lists from a sample of the publications above against the
following criteria (Tomlinson, 2003c):
• Is the list based on a coherent set of principles of language

learning?
• Are all the criteria actually evaluation criteria or are they

criteria for analysis?
• Are the criteria sufficient to help the evaluator to reach useful
conclusions?
• Are the criteria organized systematically (e.g. into categories

and subcategories which facilitate discrete as well as global
verdicts and decisions)?
• Are the criteria sufficiently neutral to allow evaluators with

different ideologies to make use of them?
• Is the list sufficiently flexible to allow it to be made use of by

different evaluators in different circumstances?
Developing Criteria for
Tomlinson (2003c) stresses that evaluators need to develop their
own principled criteria which take into account the context of the
evaluation and their own beliefs.
He also claims that evaluation criteria should be developed

before materials are produced.
1) Brainstorm a list of universal criteria:
Universal criteria: criteria which would apply to any language

learning materials anywhere for any learners.
They derive from principles of language learning and the results

of classroom observation and provide the fundamental basis for
any materials evaluation (Tomlinson, 2003c).
Examples of universal criteria would be:
• Do the materials provide useful opportunities for the learners

to think for themselves?
• Are the target learners likely to be able to follow the

instructions?
• Are the materials likely to cater for different preferred

learning styles?
• Are the materials likely to achieve affective engagement?

2) Subdivide some of the criteria:
It is best to subdivide some of the criteria into more specific

question if:
• the evaluation is the basis for subsequent revision or

adaptation of materials or
• if it is a formal evaluation and important decisions are to be

made based on the results of the evaluation.
For example:
Are the instructions:
• succinct? (quantity)
• sufficient? (quantity)
• self-standing? (independence)
• standardized? (quality)
• separated? (quality)
• sequenced? (from simple to complex)
• Staged? (systemtaticity)
3) Monitor and revise the list of universal
criteria:
Is each question an evaluation question?
If the question is an analysis question then you can only give the
answer a 1 or a 5 on the 5-point scale which is recommended
later in this suggested procedure.
For example: (Does each unit include a test?)

However, if it is an evaluation question then it can be graded at
any point on the scale.
For example: (To what extent are the tests likely to provide
useful learning experiences?)
Analysis (objective) vs. Evaluation (subjective)

Does each question only ask one question?
Many criteria in published lists ask two or more questions and

therefore cannot be used in any numerical grading of the
materials.
For example, Grant (1987) includes the following question which

could be answered ‘Yes; No’ or ‘No; Yes’:
1) Is it attractive? Given the average age of your students, would

they enjoy using it?’ (p. 122).
This question could be usefully rewritten as:
1) Is the book likely to be attractive to your students?
2) Is it suitable for the age of your students?
3) Are your students likely to enjoy using it?
Double-barreled questions: it is when sb asks a question about

more than one issue, yet allows only for one answer.
For example, “Do you think that students should have more
classes about history and culture?” contains two different issues;
one is about history and the other concerns culture.
Is each question answerable?
It is when some questions are so large and so vague that they

cannot usefully be answered, or when they cannot be answered
without reference to other criteria, or they require expert
knowledge of the evaluator.
For example: “Is it culturally acceptable?”
We need to be aware of the culture of the context in advance if

planning to answer this question.
Is each question free of dogma?
The questions should reflect the evaluators’ principles of

language learning but should not impose a rigid methodology as a
requirement of the materials.
• Are the various stages in a teaching unit adequately

developed? (presupposition: PPP)
• Do the sentences gradually increase in complexity to suit the

growing reading ability of the students? (sequence of
materials)
Is each question reliable in the sense that other
evaluators would interpret it in the same way?
There are some terms and concepts in applied linguistics which

can be interpreted differently by linguists. Therefore, it is best to
avoid them when attempting to measure the effects of materials.
• Are the materials sufficiently authentic?
• Is there an acceptable balance of skills?
• Do the activities work?
• Is each unit coherent?

Are the materials sufficiently authentic?
• Do the materials help the learners to use the language in

situations they are likely to find themselves in after the course?
Is there an acceptable balance of skills?
• Is the proportion of the materials devoted to the development of

reading skills suitable for your learners?
Do the activities work?
• Are the communicative tasks useful in providing learning

opportunities for the learners?
Is each unit coherent?
• Are the activities in each unit linked to each other in ways which
help the learners?
4) Categorize the list:
It is possible to rearrange the random list of universal criteria into

categories.
This can result in focus and the possibility of making

generalizations increases.
Learning Principles, Cultural Perspective, Topic Content
Teaching Points, Texts, Activities, Methodology
Instructions, Design and Layout

5) Develop media-specific criteria:
These are criteria which ask questions of particular relevance to

the medium used by the materials being evaluated (e.g. criteria
for books, for audio cassettes, for videos, etc.).
• Is it clear which sections the visuals refer to? (illustrations)
• Is the sequence of activities clearly signaled? (layout)
• Are the different voices easily distinguished? (audibility)
• Do the gestures of the actors help to make the language

meaningful in realistic ways? (movement)
6) Develop content-specific criteria:
These are criteria which relate to the topics and/or teaching

points of the materials being evaluated.
(For example, a grammar book may never include rhetorical

conventions of English writing).
• Do the examples of business texts (e.g. letters, invoices, etc.)

replicate features of real-life business practice?
• Do the reading texts represent a wide and typical sample of

genres?
7) Develop age-specific criteria:
These are criteria which relate to the age of the target learners.
Whether it is suitable for 5-year-olds, for 10-year-olds, for

teenagers, for young adults or for mature adults.
These criteria would relate to cognitive and affective

development, to previous experience, to interests and to wants
and needs.
• Are there short, varied activities which are likely to match the
attention span of the learners?
• Is the content likely to provide an achievable challenge in

relation to the maturity level of the learners?
8) Develop local criteria:
These are criteria which relate to the actual or potential

environment of use.
They are actually related to measuring the value of the materials

for particular learners in particular circumstances.
It is this set of criteria which is unique to the specific evaluation

being undertaken and which is ultimately responsible for most of
the decisions made in relation to the adoption, revision or
adaptation of the materials.
Typical features of the environment which would determine this
set of materials are:
• the type(s) of institution(s),
• class size,
• the background, needs and wants of the learners/teachers,
• the language policies of a particular region,
• the objectives of the courses,
• the intensity and extent of the teaching time available,
• the amount of exposure to the target language outside the

classroom.
Examples of local criteria would be:
• To what extent are the stories likely to interest 15-year-old

boys in Turkey?
• To what extent are the reading activities likely to prepare the

students for the reading questions in the Primary School
Leaving Examination in Singapore?
• To what extent are the topics likely to be acceptable to

parents of students in Iran?
9) Develop other criteria:
• teacher-specific,
• administrator-specific,
• gender-specific,
• culture-specific,
• L1-specific criteria and,
• criteria assessing the match between the materials and the

claims made by the publishers for them (internal vs. external
evaluation).
10) Trial the criteria:
It is always important to trial the criteria to ensure that the

criteria are sufficient, answerable, reliable and useful.
Revisions, if needed, can be made before the actual evaluation

begins.
11) Conducting the evaluation:
According to Tomlinson (2003c), the most effective way of

conducting an evaluation is to:
• make sure there is more than one evaluator (reliability issues),
• discuss the criteria to make sure there is equivalence of

interpretation,
• answer the criteria independently and in isolation from the

other evaluator(s),
• focus in a large evaluation on a typical unit for each level (and

then check its typicality by reference to other units),
• give a score for each criterion (with some sets of criteria
weighted more heavily than others),
• write comments at the end of each category,
• at the end of the evaluation aggregate each evaluator’s scores

for each criterion, category of criteria and set of criteria and
then average the scores,
• record the comments shared by the evaluators,
• write a joint report.

As Tomlinson (2003c) says:
What is recommended above is a very rigorous, systematic

but time-consuming approach to materials evaluation which I
think is necessary for major evaluations from which important
decisions are going to be made. However for more informal
evaluations (or when very little time is available) I would
recommend the following procedure:
Procedures for Conducting
Informal Evaluation
1) Brainstorm beliefs,
2) Decide on shared beliefs,
3) Convert the shared beliefs into universal criteria,
4) Write a profile of the target learning context for the materials,
5) Develop local criteria from the profile,
6) Evaluate and revise the universal and the local criteria,
7) Conduct the evaluation.

Conclusion
Materials evaluation is initially a time-consuming and difficult
undertaking.
Approaching it in the principled, systematic and rigorous ways

suggested above can:
1) help to make and record vital discoveries about the materials

being evaluated,
2) help the evaluators to learn a lot about materials, about

learning and teaching and about themselves.
Doing evaluations formally and rigorously can also eventually
contribute to the development of an ability to conduct principled
informal evaluations quickly and effectively when the occasion
demands:
• when asked for an opinion of a new book,
• when deciding which materials to buy in a bookshop,
• when editing other people’s materials,
• and a lot of other occasions.

List of Key Words
 Evaluation  Ideology
 Analysis  Hidden agenda
 Reliability  Authenticity
 Validity  Micro-evaluation
 Credibility  Macro-evaluation
 Action research  Impressionistic
 Adaptation  Empirical
 Adoption  Pre-use/whilst-use/post-use
 Objective  Predictive/retrospective
 Subjective  Media/content/age-specific
 Selection  Universal/local criteria
 Implementation  Process/content validity
 Criterion-referenced  Pedagogical/psychological validity
 Feedback  Kinesthetic/dependent/independent
learner
 Checklists  Double-barreled questions
 Usability  Revision
 Generalizability  Language policy
 Adaptability  Ad hoc
 Flexibility  Self- investment
 Context-specific  Attitude
 Dogma  aptitude
 In-depth  Intake
 External/internal evaluation  Self-esteem
 Blurb  Style
 False beginner  Experiential learning
 Localization  Input
 Achievability  Strategic competence
 Practicality  Awareness
 Teachability  Sensitivity
 Whole person approach  Inner voice
 Feedback  Personalization
 Kinesthetic/dependent/independent learner  Output
References
Abdelwahab, M. M. (2013). Developing an English Language Textbook
Evaluative Checklist. IOSR Journal of Research & Method in
Education, 1(3), 55-70.
Amrani, F. (2011). The process of evaluation: a publisher’s view. In B.

Tomlinson (ed.), Materials Development in Language Teaching, 2nd
edn. (pp. 267–95). Cambridge: Cambridge University Press.
Byrd, P. (2001). Textbooks: Evaluation for Selection and Analysis for

Implementation. In M. Celce-Murcia (Ed.), Teaching English as a
Second or Foreign Language, 3rd edn. (pp. 415-427). Boston, MA:
Heinle & Heinle.
Cunningsworth, A. (1984). Evaluating and Selecting EFL Teaching

Material. London: Heinemann.
Cunningsworth, A. (1995). Choosing Your Coursebook. London: Longman.
Demir, Y., & Ertas, A. (2014). A Suggested Eclectic Checklist for ELT
Coursebook Evaluation. The Reading Matrix, 14(2), 243-252.
Ellis, R. (2011). Macro- and micro-evaluations of task-based teaching. In

B. Tomlinson (ed.), Materials Development in Language Teaching,
2nd edn. (pp. 212–36). Cambridge: Cambridge University Press.
Grant, N. (1987). Making the Most of Your Textbook. Harlow: Longman.
Littlejohn, A. P. (2011). The analysis of language teaching materials:

inside the Trojan horse. In B. Tomlinson (ed.), Materials
Development in Language Teaching, 2nd edn. (pp. 179–212).
Cambridge: Cambridge University Press.
Masuhara, H. (2011). What do teachers really want from coursebooks? In

B. Tomlinson (ed.), Materials Development in Language Teaching,
2nd edn. (pp. 236–67). Cambridge: Cambridge University Press.
Mathews, A. (1985). Choosing the best available textbook, in A. Mathews,
M. Spratt and L. Dangerfield (eds), At the Chalkface. London:
Edward Arnold, pp. 202–6.
McDonough. J. and Shaw, C. (1993). Materials and Methods in ELT.

Oxford: Blackwell.
McDonough, J. and Shaw, C. (2003): Materials and Methods in ELT, 2nd

edn. Oxford: Blackwell.
McDonough, J., Shaw, C. and Masuhara, H. (2013). Materials and Methods

in ELT, 3rd edn. Malden: John Wiley and Sons.
McGrath, I. (2002). Materials Evaluation and Design for Language

Teaching. Edinburgh: Edinburgh University Press.
McGrath, I. (2013). Teaching Materials and the Roles of EFL/ESL

Teachers. London: Bloomsbury.
Mishan, F. and Timmis, I. (2015). Materials Development for TESOL.
Edinburg: University Press.
Mukundan, J. (2006). Are there new ways of evaluating ELT coursebooks?.

In J. Mukundan (ed.), Readings on ELT Material II. Petaling Jaya:
Pearson Malaysia, pp. 170–9.
Mukundan, J. and Ahour, T. (2010). A review of textbook evaluation

checklists across four decades (1970–2008). In B. Tomlinson and H.
Masuhara (eds.), Research for Materials Development in Language
Learning: Evidence for Best Practice. London: Continuum, pp. 336–52.
Riazi, A. M. (2003). What do textbook evaluation schemes tell us? A study

of the textbook evaluation schemes of three decades. In W.
Renyanda (ed.), Methodology and Materials Design in Language
Teaching: Current Perceptions and Practices and their Implications
(pp. 52–69). Singapore: SEAMEO.
Rubdy, R. (2003). Selection of materials. In B. Tomlinson (ed.),
Developing Materials for Language Teaching (pp. 37–58). London:
Continuum.
Sheldon, L. (1988). Evaluating ELT textbooks and materials. ELT Journal,

42(4), 237–46.
Schon, D. (1983). The Reflective Practitioner. London: Temple Smith
Tanner, R., & Green, C. (1998). Tasks for Teacher Education. UK:
Longman.
Tomlinson, B. (1999). Developing criteria for evaluating L2 materials.

IATEFL Issues 47, March.
Tomlinson, B. (2003b). Developing principled frameworks for materials

development. In B. Tomlinson (ed.), Developing Materials for
Language Teaching, London: Continuum, 107–29.
Tomlinson, B. (2003c). Materials evaluation. In B. Tomlinson (ed.),
Developing Materials for Language Teaching. London: Continuum,
15–36.
Tomlinson, B. (2012a). Materials development for language learning and

teaching. Language Teaching, 45(2), 1–37.
Tomlinson, B. (2012b). State of the art review. materials development for

language learning and teaching. Language Teaching, 45(2), 143–79.
Tomlinson, B. (Ed.), (2013a). Applied Linguistics and Materials

Development. London: Bloomsbury.
Tomlinson, B. (2013b). Developing Materials for Language Teaching, 2nd

edn. London: Bloomsbury.
Tomlinson, B. and Masuhara, H. (2004). Developing Language Course

Materials. Singapore: SEAMO.
Thank You
View publication stats

Materials Evaluation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Materials Evaluation

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

How to evaluate language teaching materials?

Presentation · February 2021

Show Your Achievements 😎 View project

The user has requested enhancement of the downloaded file.

Presented by Mohammad Reza Jafari and

Under the Supervision of Dr. Farangis Shahidzade

 What is materials evaluation?

 Why evaluate materials?

 Evaluation vs. analysis

 Principles in materials evaluation

 What is being evaluated?

 Types of materials evaluation

 How is the evaluation carried out?

 Who carries out the evaluation?

 Developing criteria for materials evaluation

Materials evaluation is a procedure that involves measuring the

It involves making judgements about the effect of the materials on

 the appeal of the materials to the learners,

 the credibility of the materials to learners, teachers and

 the validity of the materials (i.e. Is what they teach worth

 the ability of the materials to motivate the learners,

 the flexibility of the materials (e.g. the extent to which it is

1) By evaluating materials, the teacher/program developer can

2) Moreover, evaluation can familiarize the teacher with the

3) Material evaluation can be another way of action research

It is important to keep in mind that no two evaluations can be the

(New York vs. Tehran)

An evaluation focuses on the users of the materials and makes

On the other hand, an analysis focuses on the materials and it

Even a review for a publisher or journal, and an evaluation for a

How to eliminate/remove subjectivity?

Making an evaluation criterion-referenced can reduce (but not

• Does it provide a transcript of the listening texts? (YES/NO)

• What does it ask the learners to do immediately after reading

• Are the listening texts likely to engage the learner? (very

• For example, in the question “Does it provide a lot of guided

Analysts also often have a hidden agenda when designing their

• For example, an analyst might ask the question “Are the

For example, Cunningsworth (1984, pp. 74–9) includes both

Tomlinson’s preference for separating analysis from evaluation is

3) Match and evaluation (determining the appropriacy

Many evaluations are impressionistic, or at best are aided by an ad-hoc

In Tomlinson’s view (2003), it is very important that evaluations are

 In developing a set of principles it is useful to consider the following.

In this way, evaluators can

a) make overt their predispositions,

b) make use of them in constructing criteria for evaluation,

 This is actually the most common form of evaluation.

 “Among the materials which could usefully be evaluated are,

 Tasks can also be evaluated (Ellis, 2011).

Ellis (2011: 231) argues that “[micro-evaluation] forces a teacher

• purpose: doing an evaluation to

1) help a publisher to make decisions about publication,

2) to help yourself in developing materials for publication,

4) to write a review for a journal or as part of a research

5) a Director of Studies or an Inspector of English.

1) doing a mental evaluation in a bookshop,

2) filling in a short questionnaire in class or

3) doing a rigorous empirical analysis of data elicited from a

• timing: you might be doing your evaluation

1) before the materials are used,

2) while they are being used or

3) after they have been used.