Professional Documents
Culture Documents
Reg NO:
COURSE CODE: 6507
SEMESTER : Spring 2022
Assignment No.1
QUESTION NO 1
ANSWER:
Measurement is a systematic process of determining the attributes of an object. It
ascertains how fast, tall, dense, heavy, broad, something is. However, one can make
measurements of physical attributes only and if one has to measure those attributes
which cannot be measured with the help of tools. That is where the need
for evaluation arises. It helps in passing value judgement about the policies,
performances, method, techniques, strategies, effectiveness, etc. of teaching.
Measurement provides a solid base to make an evaluation, as you have something
concrete to make a comparison between the objects. Further, Evaluation has a crucial
role to play in reforming the learning and teaching process and suggesting changes in
the curriculum.
When one of the sets of numerals is assigned to each set of objects, be it person or
commodity, as per the accepted rules or standards and described in standard words, units
and symbols, so as to characterize the status of that object it is called as measurement. In
education, measurement implies the quantitative assessment of the student’s performance
in an exam.
It is a mechanical process, which involves the systematic study of the attributes with the
help of appropriate assessment tools. It transforms the variable into variate, which is
effective in making deductions. For instance, Intelligence is measured in terms of IQ,
and the result variable is measured as scores.
Further, it is helpful in comparing the performance of various students as well as in
highlighting their positive and negative points.
Types of Measurement
What factors can influence the test administration process and how
scoring problems can be addressed at secondary level.
ANSWER:
Test administration guidelines are a set of policies and procedures that outline how
standardized assessments should be distributed and administered. These guidelines
exist in order to increase consistency, ensure test security, and safeguard the fair
and reliable results of exam scores.
How do we account for an individual who does not get exactly the same test score every
time he or she takes the test? Some possible reasons are the following:
Test taker's temporary psychological or physical state. Test performance can be
influenced by a person's psychological or physical state at the time of testing. For
example, differing levels of anxiety, fatigue, or motivation may affect the
applicant's test results.
Environmental factors. Differences in the testing environment, such as room
temperature, lighting, noise, or even the test administrator, can influence an
individual's test performance.
Test form. Many tests have more than one version or form. Items differ on each
form, but each form is supposed to measure the same thing. Different forms of a
test are known as parallel forms or alternate forms. These forms are designed to
have similar measurement characteristics, but they contain different items.
Because the forms are not exactly the same, a test taker might do better on one
form than on another.
Multiple raters. In certain tests, scoring is determined by a rater's judgments of
the test taker's performance or responses. Differences in training, experience,
and frame of reference among raters can produce different test scores for the
test taker.
Principle of Assessment: Use only reliable assessment instruments and procedures. In
other words, use only assessment tools that provide dependable and consistent
information.
These factors are sources of chance or random measurement error in the assessment
process. If there were no random errors of measurement, the individual would get the
same test score, the individual's "true" score, each time. The degree to which test scores
are unaffected by measurement errors is an indication of the reliability of the test.
Reliable assessment tools produce dependable, repeatable, and consistent information
about people. In order to meaningfully interpret test scores and make useful
employment or career-related decisions, you need reliable tools. This brings us to the
next principle of assessment.
Tests that measure multiple characteristics are usually divided into distinct
components. Manuals for such tests typically report a separate internal
consistency reliability coefficient for each component in addition to one for the
whole test. Test manuals and reviews report several kinds of internal consistency
reliability estimates. Each type of estimate is appropriate under certain
circumstances. The test manual should explain why a particular estimate is
reported.
QUESTION NO 3
ANSWER:
After the overall content of the test has been established through a job analysis, the next
step in test development is to create the detailed test specifications. Test specifications
usually include a test description component and a test blueprint component. The test
description specifies aspects of the planned test such as the test purpose, the target
examinee population, the overall test length, and more. The test blueprint, sometimes also
called the table of specifications, provides a listing of the major content areas and
cognitive levels intended to be included on each test form. It also includes the number of
items each test form should include within each of these content and cognitive areas.
The test description component of an exam program's test specifications is a written
document that provides essential background information about the planned exam
program. This information is then used to focus and guide the remaining steps in the test
development process. At a minimum, the test description may simply indicate who will
be tested and what the purpose of the exam program is. More often, the test description
will usually also include elements such as the overall test length, the test administration
time limit, and the item types that are expected to be used (e.g., multiple choice, essay).
In some cases the test description may also specify a test administration mode (e.g.,
paper-and-pencil, performance-based, computer-based). And, if the test will include any
items or tasks that will need to be scored by human raters, the test description may also
include plans for the scoring procedures and scoring rubrics. The content areas listed in
the test blueprint, or table of specifications, are frequently drawn directly from the results
of a job analysis. These content areas comprise the knowledge, skills, and abilities that
have been determined to be the essential elements of competency for the job or
occupation being assessed. In addition to the listing of content areas, the test blueprint
specifies the number or proportion of items that are planned to be included on each test
form for each content area. These proportions reflect the relative importance of each
content area to competency in the occupation.
Most test blueprints also indicate the levels of cognitive processing that the examinees
will be expected to use in responding to specific items (e.g., Knowledge, Application). It
is critical that your test blueprint and test items include a substantial proportion of items
targeted above the Knowledge-level of cognition. A typical test blueprint is presented in a
two-way matrix with the content areas listed in the table rows and the cognitive processes
in the table columns. The total number of items specified for each column indicates the
proportional plan for each cognitive level on the overall test, just as the total number of
items for each row indicates the proportional emphasis of each content area.
The test blueprint is used to guide and target item writing as well as for test form
assembly. Use of a test blueprint improves consistency across test forms as well as
helping ensure that the goals and plans for the test are met in each operational test. An
example of a test blueprint is provided next. In the (artificial) test blueprint for a Real
Estate licensure exam given below the overall test length is specified as 80 items. This
relatively small test blueprint includes four major content areas for the exam (e.g., Real
Estate Law). Three levels of cognitive processing are specified. These are Knowledge,
Comprehension, and Application.
Each test form written to this table of specifications will include 40% of the total test (or
32 items) in the content area of Real Estate Law. In addressing cognitive levels, 35% of
the overall test (or 28 items) will be included at the Knowledge-level. The interior cells of
the table indicate the number of items that are intended to be on the test from each
content and cognitive area combination. For example, the test form will include 16 items
at the Knowledge-level in the content area of Real Estate Law.
Real 16 8 8 32 40%
Estate
Law
Real 4 12 16 20%
Estate
Practices
Financing 8 8 8 24 30%
/
Mortgage
Markets
Real 8 8 10%
Estate
Math
Total 28 28 24 80
The test specifications for an exam program provide essential planning materials for the
test development process. Thorough, thoughtful test specifications can guide the
remainder of the test development process, especially item writing efforts and test
assembly. An initial test form can be developed according to these specifications to
appropriately reflect the content and cognitive emphases intended. The specifications can
also be used to guide the development of later, additional test forms. Careful linking
between the job analysis, test specifications, and test items will go a long way to
providing strong content validity and legal defensibility for the exam program.
QUESTION NO 4
Why essay type items are considered easy to administer and difficult
to score? Explain with practical examples.
ANSWER:
An essay test may give full freedom to the students to write any number of pages.
The required response may vary in length. An essay type question requires the pupil to
plan his own answer and to explain it in his own words. The pupil exercises considerable
freedom to select, organise and present his ideas. Essay type tests provide a better
indication of pupil’s real achievement in learning. The answers provide a clue to nature
and quality of the pupil’s thought process.
That is, we can assess how the pupil presents his ideas (whether his manner of
presentation is coherent, logical and systematic) and how he concludes. In other words,
the answer of the pupil reveals the structure, dynamics and functioning of pupil’s mental
life.
The essay questions are generally thought to be the traditional type of questions which
demand lengthy answers. They are not amenable to objective scoring as they give scope
for halo-effect, inter-examiner variability and intra-examiner variability in scoring.
1. One of the serious limitations of the essay tests is that these tests do not give scope for
larger sampling of the content. You cannot sample the course content so well with six
lengthy essay questions as you can with 60 multiple-choice test items.
2. Such tests encourage selective reading and emphasise cramming.
3. Moreover, scoring may be affected by spelling, good handwriting, coloured ink,
neatness, grammar, length of the answer, etc.
4. The long-answer type questions are less valid and less reliable, and as such they have
little predictive value.
5. It requires an excessive time on the part of students to write; while assessing, reading
essays is very time-consuming and laborious.
6. It can be assessed only by a teacher or competent professionals.
7. Improper and ambiguous wording handicaps both the students and valuers.
8. Mood of the examiner affects the scoring of answer scripts.
9. There is halo effect-biased judgement by previous impressions.
10. The scores may be affected by his personal bias or partiality for a particular point of
view, his way of understanding the question, his weightage to different aspect of the
answer, favouritism and nepotism, etc.
Thus, the potential disadvantages of essay type questions are:
(i) Poor predictive validity,
(ii) Limited content sampling,
(iii) Scores unreliability, and
(iv) Scoring constraints.
The teacher can sometimes, through essay tests, gain improved insight into a student’s
abilities, difficulties and ways of thinking and thus have a basis for guiding his/her
learning.
(A) White Framing Questions:
1. Give adequate time and thought to the preparation of essay questions, so that they can
be re-examined, revised and edited before they are used. This would increase the validity
of the test.
2. The item should be so written that it will elicit the type of behaviour the teacher wants
to measure. If one is interested in measuring understanding, he should not ask a question
that will elicit an opinion; e.g.,
“What do you think of Buddhism in comparison to Jainism?”
3. Use words which themselves give directions e.g. define, illustrate, outline, select,
classify, summarise, etc., instead of discuss, comment, explain, etc.
4. Give specific directions to students to elicit the desired response.
5. Indicate clearly the value of the question and the time suggested for answering it.
6. Do not provide optional questions in an essay test because—
(i) It is difficult to construct questions of equal difficulty;
(ii) Students do not have the ability to select those questions which they will answer best;
(iii) A good student may be penalised because he is challenged by the more difficult and
complex questions.
7. Prepare and use a relatively large number of questions requiring short answers rather
than just a few questions involving long answers.
8. Do not start essay questions with such words as list, who, what, whether. If we begin
the questions with such words, they are likely to be short-answer question and not essay
questions, as we have defined the term.
9. Adapt the length of the response and complexity of the question and answer to the
maturity level of the students.
10. The wording of the questions should be clear and unambiguous.
11. It should be a power test rather than a speed test. Allow a liberal time limit so that the
essay test does not become a test of speed in writing.
12. Supply the necessary training to the students in writing essay tests.
13. Questions should be graded from simple to complex so that all the testees can answer
atleast a few questions.
14. Essay questions should provide value points and marking schemes.
(B) While Scoring Questions:
1. Prepare a marking scheme, suggesting the best possible answer and the weightage
given to the various points of this model answer. Decide in advance which factors will be
considered in evaluating an essay response.
2. While assessing the essay response, one must:
a. Use appropriate methods to minimise bias;
b. Pay attention only to the significant and relevant aspects of the answer;
c. Be careful not to let personal idiosyncrasies affect assessment;
d. Apply a uniform standard to all the papers.
3. The examinee’s identity should be concealed from the scorer. By this we can avoid the
“halo effect” or “biasness” which may affect the scoring.
4. Check your marking scheme against actual responses.
5. Once the assessment has begun, the standard should not be changed, nor should it vary
from paper to paper or reader to reader. Be consistent in your assessment.
6. Grade only one question at a time for all papers. This will help you in minimising the
halo effect in becoming thoroughly familiar with just one set of scoring criteria and in
concentrating completely on them.
7. The mechanics of expression (legibility, spelling, punctuation, grammar) should be
judged separately from what the student writes, i.e. the subject matter content.
8. If possible, have two independent readings of the test and use the average as the final
score.
QUESTION NO 5
ANSWER:
One of the major goals of education is to prepare students for the next step in their
future. They have to make sure that their learners have acquired enough knowledge about
the field of study. Only good tests ensure this. A good test is not only a score that learners
struggle to ace.
It’s feedback a student receives to improve his skills and knowledge and a good teacher
loves to get back to, always, to make sure their teaching strategies are on point and
whether they need development or not.
It’s also a feedback for decision-makers in all educational institutions and governmental
positions who need good data to get to the next step of the institution or the State’s
education plan.
It’s not something centric that students spend days of anxiety on, wondering how well
they will do in a given test and how well the test questions are actually written and
whether they are questions they do know the answer to or not.
Teachers used to measure students’ knowledge only by how they score in a given exam.
They give students only one chance to show their competencies without discussions or
classroom projects. Online assessment is a way through which teachers can improve
students’ learning, knowledge, beliefs, and skills. Online assessments can be behavioral,
cognitive, or communicative assessments.
Students may take the online assessment in the classroom or at home and this reduces
their stress. New tools are now introduced for instructors to set different types of
assessments.
There are four general classes of reliability estimates, each of which estimates reliability
in a different way. They are:
Inter-Rater or Inter-Observer Reliability: Used to assess the degree to which
different raters/observers give consistent estimates of the same phenomenon.
Test-Retest Reliability: Used to assess the consistency of a measure from one
time to another.
Parallel-Forms Reliability: Used to assess the consistency of the results of two
tests constructed in the same way from the same content domain.
Internal Consistency Reliability: Used to assess the consistency of results across
items within a test.
Test-Retest Reliability
We estimate test-retest reliability when we administer the same test to the same sample
on two different occasions. This approach assumes that there is no substantial change in
the construct being measured between the two occasions. The amount of time allowed
between measures is critical. We know that if we measure the same thing twice that the
correlation between the two observations will depend in part by how much time elapses
between the two measurement occasions. The shorter the time gap, the higher the
correlation; the longer the time gap, the lower the correlation. This is because the two
observations are related over time – the closer in time we get the more similar the factors
that contribute to error. Since this correlation is the test-retest estimate of reliability, you
can obtain considerably different estimates depending on the interval.
Parallel-Forms Reliability
In parallel forms reliability you first have to create two parallel forms. One way to
accomplish this is to create a large set of questions that address the same construct and
then randomly divide the questions into two sets. You administer both instruments to the
same sample of people. The correlation between the two parallel forms is the estimate of
reliability. One major problem with this approach is that you have to be able to generate
lots of items that reflect the same construct. This is often no easy feat. Furthermore, this
approach makes the assumption that the randomly divided halves are parallel or
equivalent. Even by chance this will sometimes not be the case. The parallel forms
approach is very similar to the split-half reliability described below. The major difference
is that parallel forms are constructed so that the two forms can be used independent of
each other and considered equivalent measures. For instance, we might be concerned
about a testing threat to internal validity. If we use Form A for the pretest and Form B
for the posttest, we minimize that problem. it would even be better if we randomly assign
individuals to receive Form A or B on the pretest and then switch them on the posttest.
With split-half reliability we have an instrument that we wish to use as a single
measurement instrument and only develop randomly split halves for purposes of
estimating reliability.