You are on page 1of 6

MEASUREMENT, EVALUATION & RESEARCH

There are a variety of ways of knowing whether or not something is true. Science (see Payla,
2000) is one of those ways; scientists have established a set of rules and methodology by which
truth is verified (Kuhn, 1962). The process of science generally follows a paradigm that defines
the rules and describes procedures, instrumentation and methods of interpretation of data
(Wilber, 1998). The results of science are formulated into a hierarchy of increasing complexity
of knowledge: facts, concepts, principles, theories, and laws. When engaged in the process of
science, scientists formulate hypotheses or educated guesses about the relationships between or
among different facets of knowledge.
Assessment, measurement, research, and evaluation are part of the processes of science and
issues related to each topic often overlap. Assessment refers to the collection of data to describe
or better understand an issue, measurement is the process of quantifying assessment data,
research refers to the use of data for the purpose of describing, predicting, and controlling as a
means toward better understanding the phenomena under consideration, and evaluation refers to
the comparison of data to a standard for the purpose of judging worth or quality. Assessment
and/or measurement are done with respect to variables (phenomena that can take on more than
one value or level). For example, the variable "gender" has the values or levels of male and
female and data could be collected relative to this variable. Data on variables are normally
collected by one or more of four methods: paper/pencil, systematic observation, participant
observation, and clinical. Three types of research studies are normally performed: descriptive,
correlational, and experimental.
Collecting data (assessment), quantifying that data (measurement), making judgments
(evaluation), and developing understanding about the data (research) always raise issues of
reliability and validity. Reliability attempts to answer concerns about the consistency of the
information (data) collected, while validity focuses on accuracy or truth. The relationship
between reliability and validity can be confusing because measurements (e.g., scores on tests,
recorded statements about classroom behavior) can be reliable (consistent) without being valid

(accurate or true). However, the reverse is not true: measurements cannot be valid without being
reliable.
The same statement applies to findings from research studies. Findings may be reliable
(consistent across studies), but not valid (accurate or true statements about relationships among
"variables"), but findings may not be valid if they are not reliable. At a minimum, for an
instrument to be reliable a consistent set of data must be produced each time it is used; for a
research study to be reliable it should produce consistent results each time it is performed.
Classification of Scientific Knowledge
The scientific method is used to generate a database of scientific knowledge. A generally
accepted hierarchy of scientific knowledge includes:
1. facts -- an idea or action that can be verified -- Example: names and dates of important
activities; population of the United States in the latest census;
2. concepts -- rules that allow for categorization of events, places, people ideas, etc. -Example: a DESK is a piece of furniture (also a concept) designed with a flat top for
writing; a CHAIR is a piece of furniture designed for sitting; a CHAIR with a flat surface
attached to it that is designed for writing is also called a DESK;
3. principles -- relationship(s) between/among facts and/or concepts; used to generate ifthen statements -- Example: the number of children in the family is related to the average
scores on nationally standardized achievement tests for those children;
4. laws -- firmly established, thoroughly tested, principle or if-then statement -- Example: a
fixed interval schedule for delivering reinforcement produces a scalloping effect on
behavior.
Two other important terms relate to how scientists think about and organize this knowledge:
1. hypotheses -- educated guess about what will be found in a scientific study, especially in
terms of correlational relationships (if-then statements of principles) and causal

relationships (if-then statements of laws) -- Example: for lower-division, undergraduate


students study habits is a better predictor of success in a college course than is a measure
of intelligence or reading comprehension;
2. theories -- set of facts, concepts, and principles that organize multiple findings and allow
for description and explanation -- Example: Piaget's theory of cognitive development,
Erikson's theory of socioemotional development, and Skinner's theory of operant
conditioning.
The human mind does not think or reason in terms of discrete elements or "facts." Rather the
mind seeks patterns among discrete elements and processes information in terms of concepts or
the rules for categorizing facts. When people build relationships among facts and concepts (i.e.,
develop principles), they are able to remember, understand, and access an astonishing amount of
information. People are also able to make predictions from present to future circumstances
based on these understandings. However, it is when theories are developed (add organization
and explanations to facts, concepts, and principles) based on laws (empirically validated causal
principles) that scientists accomplish the highest goal of science--to control the variables they are
studying.
Overview
The aim of theory and practice in educational measurement is typically to measure abilities and
levels of attainment by students in areas such as reading, writing, mathematics, science and so
forth. Traditionally, attention focuses on whether assessments are reliable and valid. In practice,
educational measurement is largely concerned with the analysis of data from educational
assessments or tests. Typically, this means using total scores on assessments, whether they are
multiple choice or open-ended and marked using marking rubrics or guides.
In technical terms, the pattern of scores by individual students to individual items is used to infer
so-called scale locations of students, the "measurements". This process is one form of scaling.
Essentially, higher total scores give higher scale locations, consistent with the traditional and
everyday use of total scores.[1] If certain theory is used, though, there is not a strict
correspondence between the ordering of total scores and the ordering of scale locations. The

Rasch model provides a strict correspondence provided all students attempt the same test items,
or their performances are marked using the same marking rubrics.
In terms of the broad body of purely mathematical theory drawn on, there is substantial overlap
between educational measurement and psychometrics. However, certain approaches considered
to be a part of psychometrics, including Classical test theory, Item Response Theory and the
Rasch model, were originally developed more specifically for the analysis of data from
educational assessments.
One of the aims of applying theory and techniques in educational measurement is to try to place
the results of different tests administered to different groups of students on a single or common
scale through processes known as test equating. The rationale is that because different
assessments usually have different difficulties, the total scores cannot be directly compared. The
aim of trying to place results on a common scale is to allow comparison of the scale locations
inferred from the totals via scaling processes.
Evaluation is perhaps the most complex and least understood of the terms. Inherent in the idea
of evaluation is "value." When we evaluate, what we are doing is engaging in some process that
is designed to provide information that will help us make a judgment about a given situation.
Generally, any evaluation process requires information about the situation in question. A
situation is an umbrella term that takes into account such ideas as objectives, goals, standards,
procedures, and so on. When we evaluate, we are saying that the process will yield information
regarding the worthiness, appropriateness, goodness, validity, legality, etc., of something for
which a reliable measurement or assessment has been made. For example, I often ask my
students if they wanted to determine the temperature of the classroom they would need to get a
thermometer and take several readings at different spots, and perhaps average the readings. That
is simple measuring. The average temperature tells us nothing about whether or not it is
appropriate for learning. In order to do that, students would have to be polled in some reliable
and valid way. That polling process is what evaluation is all about. A classroom average
temperature of 75 degrees is simply information. It is the context of the temperature for a
particular purpose that provides the criteria for evaluation. A temperature of 75 degrees may not
be very good for some students, while for others, it is ideal for learning. We evaluate every day.

Teachers, in particular, are constantly evaluating students, and such evaluations are usually done
in the context of comparisons between what was intended (learning, progress, behavior) and
what was obtained. When used in a learning objective, the definition provided on the ADPRIMA
site for the behavioral verb evaluate is: To classify objects, situations, people, conditions, etc.,
according to defined criteria of quality. Indication of quality must be given in the defined criteria
of each class category. Evaluation differs from general classification only in this respect.
To sum up, we measure distance, we assess learning, and we evaluate results in terms of some set
of criteria. These three terms are certainly share some common attributes, but it is useful to think
of

them

as

separate

but

connected

ideas

and

processes.

Here is a great link that offer different ideas about these three terms, with well-written
explanations. Unfortunately, most information on the Internet concerning this topic amounts to
little more than advertisements for services.
Assessment is a process by which information is obtained relative to some known objective or
goal. Assessment is a broad term that includes testing. A test is a special form of assessment.
Tests are assessments made under contrived circumstances especially so that they may be
administered. In other words, all tests are assessments, but not all assessments are tests. We test
at the end of a lesson or unit. We assess progress at the end of a school year through testing, and
we assess verbal and quantitative skills through such instruments as the SAT and GRE. Whether
implicit or explicit, assessment is most usefully connected to some goal or objective for which
the assessment is designed. A test or assessment yields information relative to an objective or
goal. In that sense, we test or assess to determine whether or not an objective or goal has been
obtained. Assessment of skill attainment is rather straightforward. Either the skill exists at some
acceptable level or it doesnt. Skills are readily demonstrable. Assessment of understanding is
much more difficult and complex. Skills can be practiced; understandings cannot. We can assess
a persons knowledge in a variety of ways, but there is always a leap, an inference that we make
about what a person does in relation to what it signifies about what he knows. In the section on
this site on behavioral verbs, to assess means To stipulate the conditions by which the behavior

specified in an objective may be ascertained. Such stipulations are usually in the form of written
descriptions.

You might also like