6507 Assignment 2

Course: Educational Mesurement and Evaluation (6507) Semester: Spring,
2021 Level: MA /M. Ed.
Assignment No. 2
Q.1 What techniques are sueful for measuring behaviours? Why personality inventories
were developed? Explain their utility.
Quantitative Analysis:
Measurement in psychology employs two types of data or responses; verbal data and behavioural
data. Verbal data refers to what a person says or writes by using language. For example, when a
psychologist is trying to study a person who is depressed, he may ask the person to narrate what
exactly is happening to him or write in detail about what he is undergoing.
Behavioural data, on the other hand, refers to other forms of bodily responses such as muscular,
glandular, sensory, neural, perceptual etc. The same depressed individual could be studied by the
psychologist by carefully observing his facial expressions, tone, posture, etc. The reader is
already familiar with the distinction between verbal tests and non-verbal tests.
This use of verbal and non-verbal material is not confined only to intelligence testing but also
extends to other aspects of behaviour. For instance, personality traits such as sociability,
assertiveness, tolerance, love of privacy etc., can be measured through various personality
questionnaires and inventories like Cattell’s Sixteen Personality Factor Questionnaire (16 PF),
Eysenck’s Personality Inventory, Minnesota Multiphasic Personality Inventory (MMPI), etc.
These traits can also be measured through behavioural changes such as bodily changes, cortical
arousal, observation of learning behaviour, etc. In many instances both verbal and non-verbal
measures are used.
We may examine the various indicators which are employed to measure verbal and non-verbal
behaviour.
Some of the important indicators which are employed to measure verbal and non-verbal
behaviour are mentioned below:
1. Response Time or Latency:
One factor which is normally employed to measure behaviour is the time taken for an individual
to produce a response. A classical example of this is the reaction-time experiment.
2. Duration of Response:
Another factor which is taken into consideration for measuring behaviour is the duration of time
for which a particular behaviour of response occurs. Measurements of after images and such
other sensory experiences are subjected to this type of indices. Suppose you look at a bright
green light.
The experienced greenness may remain for a moment even after you cease looking at the light.
Similarly, when you hear a loud sound prolonged for sometime or inhale a strong perfume for a
long time the sound and perfume will remain for sometime even after these stimuli are
withdrawn.
3. Time Taken for a Response to be Completed:

This measure is used very widely in measuring learning, intelligence and other abilities. For
example, in Skinner’s learning experiment or Thorndike’s trial-and-error learning experiment,
one of the criteria employed to measure whether the rat or cat has learnt the correct path is in
terms of the time taken by the animal to reach its goal.
4. Frequency of Response:
The number of times a particular response occurs within a given time or on a particular occasion
is another indicator. An example of this type can be seen in the measurement of fluctuation of
attention. Experiments on fluctuation of attention employ, as an index, the number of times
attention shifts from one aspect of a given stimulus to another within a stipulated time limit.
5. Amount of Response:
In measuring emotional behaviour the amount or intensity of glandular and muscular responses
is employed as an indicator. If a person’s aggression has to be measured, then the experimenter
may try to measure the subject’s blood pressure, rate of respiration, rate of heartbeat, gestures,
tone, facial and other expressions accompanied by certain psychological changes. Only after
analysing and combining a variety of such data does one may arrive at the measure indicating
overall reaction of aggression or total amount of aggressive reaction.
6. Number of Trials Required:

Yet, another indicator used is the number of trials, practice attempts or presentations of a certain
stimulus. This is very commonly used in experiments on learning. In most of the learning
experiments the number of attempts required by an organism to learn a task to a standard or
criterion is used as an index. Similarly, experiments on remembering and learning also employ a
number of presentations or trials required for a person to learn verbal material to the point of
perfect recall.
7. Number of Correct Responses and Wrong Responses:

Learning experiments as well as ability tests use the total number of correct responses or wrong
responses as indicators. For example, intelligence, achievement, memory and other cognitive
factors are measured by various intelligence tests such as Binet’s intelligence test, and
Wechsler’s intelligence test.
In all such instances the performance of a subject is finally interpreted in terms of intelligence
quotient or high or low aptitude for a specific skill. This is usually arrived at by employing the
criterion of assessing high scores and giving more weightage to correct responses and vice versa.
8. Response Deviation:
Yet another measured indicator is the degree or extent to which an individual response differs or
varies from the normal or averages response. This is very commonly used in the measurement of
abilities, personality etc. The most suitable example to understand this criterion is that IQs can be
interpreted according to the norms of the normal probability curve.
Distribution of various IQs on this curve indicates whether intelligence of the subjects is below
average, average or above average. This also indicates the direction of the deviation, whether it is
towards the positive or negative side. In other words whether the subject is gifted, average or
retarded and if so to what extent.
9. Complexity and Difficulty of Response:
The more complex and difficult a particular response, the higher the score. The concept of
mental age is based on this. Some of the items of aptitude tests and intelligence tests are planned
in such a way that the difficulty level is deliberately increased.
For instance, in Binet’s intelligence test the items are arranged in such a manner that the
complexity and difficulty level is increased gradually as the test advances. Thus, we see that
psychologists employ different kinds of measures depending on the nature of the behaviour and
the purpose of the measurement.
Relative vs. Absolute Scores:

A point to be borne in mind is the distinction between relative and absolute measures. When
somebody is described as five-foot tall, this means his height is five feet. This is an example of
absolute measure. Wherever his height is measured it would be the same and its value will not
change. On the other hand, when the same person is described as having an IQ of 110, this is not
an absolute measure.
This depends on the test administered. If a test with different norms is administered, the person’s
IQ may turn out to be much higher or lower than 110. Here, we see that IQ is not an absolute
measure but a relative measure. The person s IQ is reported as 110 in relation to the performance
of other people belonging to his age-group, i.e. it is in comparison with a norm or standard. If the
norm changes the IQ will also change.
To a very large extent psychological scores or measures are relative measures. They are always
arrived at with reference to certain norms unlike physical measures. In view of this,
psychological measure such as IQ should be interpreted very carefully. Similarly, when a person
is described as extroverted, this is again with reference to certain average or norms.
It is a limitation that psychologists have to largely depend on relative measures and not on
absolute measures. But this difficulty has not prevented them from developing more and more
tools and techniques measuring different aspects of behaviour.
Such measurements have also been found to be very useful in understanding and predicting
behaviour. Psychological measurement is widely employed today not only in laboratories but for
practical purposes such as selecting people for jobs, diagnosing psychological abnormalities and
so on.
Reliability and Validity:

Whatever be the type of measures employed, it is necessary that such measures should satisfy
certain criteria. This is very necessary for developing scientific laws and generalisations and also
for using these measures for practical purposes.
Firstly, psychological measures like any other measures should be accurate and precise.
Secondly, they should be sensitive. For example, if a test of intelligence gives the IQ of a person
as 120 then this test should be able to measure accurately any changes in the IQ.
Similarly, other measures such as attitude scales should be sensitive so that even very small
changes in attitudes should be indicated by the scores. Thus, accuracy, precision and sensitivity
are important requirements.
Psychological tools should be carefully constructed so that the scores or measures given by them
can be depended upon. This quality is known as reliability. A particular tool, if it yields one
score today and another tomorrow and a third score on another day then this score or measure is
not dependable. This is why psychological scores are usually given on the basis of the average of
a number of performances and rarely on the basis of a single performance.
Here we can see a difference between physical measurement and psychological measurement.
Reliability is a very important characteristic and psychologists have developed different methods
of ensuring the reliability of measurements.
Yet another important quality is known as validity. This means that when a psychologist claims
to measure a particular psychological attribute or behaviour he should make sure that he is really
measuring the particular quality such as extroversion. We must make sure that the test really
measures extroversion and not something else.
For example, a thermometer measures temperature and not atmospheric pressure. Similarly, a
test of intelligence should measure intelligence and not something else. This particular quality of
measuring what one purports to measure is called validity. People employing psychological
measurements or scores should make sure that these requirements are taken care of.
Stimulus Measures and Response Measures:

The different indicators used in psychological measurements such as intensity of response,
duration of response, latency etc. All these are based on the response or the behaviour of the
person and are, therefore, called response measurements.
Such measurements, however, were not the first type of measurements used in psychology.
These types of measurements had to wait for the development of proper tools and tests and also
the development of adequate theories and principles of psychological measurement.
Early measurements of behaviour were made in a different manner. Scientists of the nineteenth
century interested in studying the relationship between physical events or stimuli, on the one
hand, and psychological responses, on the other, employed other types of measurement in their
experiments.
Since proper techniques were not available for measuring responses they tried to measure
responses indirectly with reference to measurement of stimuli or physical events. These scientists
known as psychophysicists developed a brand of psychology known as psychophysics.
Reference:
https://www.psychologydiscussion.net/behaviour/human-behaviour/how-to-measure-behaviour-
of-an-individual-human-behaviour-psychology/3257
Q.2 How the length and item difficulty of a test can influence the appropriate assessment of
students? Explain other considerations that help to develop appropriate test items.
Item Analysis is an important (probably the most important) tool to increase test
effectiveness. Each items contribution is analyzed and assessed.
To write effective items, it is necessary to examine whether they are measuring the fact, idea, or
concept for which they were intended. This is done by studying the student’s responses to each
item. When formalized, the procedure is called “item analysis”. It is a scientific way of
improving the quality of tests and test items in an item bank.
An item analysis provides three kinds of important information about the quality of test items.
 Item difficulty: A measure of whether an item was too easy or too hard.
 Item discrimination: A measure of whether an item discriminated between students who

knew the material well and students who did not.
 Effectiveness of alternatives: Determination of whether distractors (incorrect but plausible

answers) tend to be marked by the less able students and not by the more able students.
Item difficulty, item discrimination and the effectiveness of distractors on a multiple-choice test
are automatically available with ParScore’s item analysis. An illustration of ParScore’s
“Standard Item Analysis Report” printout is attached.
The Test Development Process
In the discussion of the test development process that follows, we refer to the most widely
accepted set of guidelines, the Standards for Educational and Psychological Testing, which
is a joint publication of the American Educational Research Association, the American
Psychological Association, and the National Council on Measurement in Education (1999),
referred to as “the Standards” from here on.3 The Standards were developed by a joint
committee of 15 leading testing experts and professionals appointed by the above three
sponsoring organizations. While the Standards are a product of the three sponsoring
organizations, more than 50 different groups provided comment and input over the
multiyear development process. Committee members have observed that the Standards are
cited in the technical reports of many state assessment programs. They have been adopted
by several federal agencies, including the Office of Educational Research and Improvement
of the U.S. Department of Education, the U.S. Department of Defense, and the Office of
Personnel Management. They are also cited in policy guidance issued by the Equal
Employment Opportunity Commission and cited as the authoritative standards in
numerous education and employment legal cases.
TEST DEVELOPMENT
Among the Standards’ guiding principles are that test development should have a sound
scientific basis and that evidence of the scientific approach should be documented.
Although the exact sequence of events for developing a test varies from program to
program, the Standards lay out a series of general procedures that should take place when
developing most kinds of test (Chapter 3):
Specify the purpose of the test and the inferences to be drawn.
Develop frameworks describing the knowledge and skills to be tested.
Build test specifications.
Create potential test items and scoring rubrics.
Review and pilot test items.
Evaluate the quality of items.
The introduction to the Standards describes the types of tests to which they apply: “. . . the
Standards applies most directly to standardized measures generally recognized as ‘tests’
such as measures of ability, aptitude, achievement, attitudes, interests, personality,
cognitive functioning, and mental health, it may also be usefully applied in varying degrees
to a broad range of less formal assessment techniques” (p. 3). The document includes
general chapters about test construction, evaluation, documentation, and fairness, and also
more specific chapters about psychological testing, educational testing, testing in
employment and credentialing, and testing in program evaluation and public policy.
Whether one considers the naturalization tests most like achievement or like certification
tests, they clearly fall under the umbrella of the Standards.
 Assemble test forms.
 If needed, set cutscores defining pass/fail or proficiency categories.
Reference:
https://www.uwosh.edu/testing/faculty-information/test-scoring/score-report-interpretation/item-
analysis-1
https://www.nap.edu/read/11168/chapter/5#15
Q.3 Explain the principles of appropriate marking. Highlight the challenges for a treacher
to use these principles in classroom testing
General Marking Principles for National 5 History assignment

This information is provided to help you understand the general principles that will be applied
when marking candidate responses in this Assignment. These principles are reflected in the
detailed marking instructions that will be used to mark the assignment.
 Marks for each candidate response will always be assigned in line with these general marking
principles and the detailed Marking Instructions.
 Principal Assessors will provide guidance on marking specific candidate responses which are
not covered by either the principles or detailed Marking Instructions.
 Marking will always be positive, ie marks will be awarded for what is correct and not
deducted for errors or omissions.
 The purpose of the History Resource Sheet is to help candidates use their evidence and
references, collected during the research stage, to address their chosen question or issue. The
Resource Sheet should be no more than one single side of A4. No marks will be awarded for
directly copying extended pieces of text/narrative from the Resource Sheet. It must not be used
by candidates to pre-write their assignment.
 The Resource Sheet should not be marked. However, it may enable clarification of points
which the candidate has made in the presentation of their evidence, and may allow markers to
gain an insight into what the candidate intended.
 In presenting their findings, candidates will show the following skills, knowledge and
understanding:
A. Explaining different factors contributing to the impact or causes of an event or

development
Candidates can be credited in a number of ways up to a maximum of 2 marks. Candidates

should identify the factors contributing to the causes or impact of an event or development and
show the connection between this factor and the event or development.
B. Using information from sources referred to, in order to support factors
Candidates can be credited in a number of ways up to a maximum of 2 marks. They may

reference their sources in a number of ways.
Using other information, to support a factor

Candidates can be credited in a number of ways up to a maximum of 4 marks.
D. Evaluating different factors contributing to the impact or causes of an event or
development
Candidates can be credited in a number of ways up to a maximum of 3 marks. Candidates
should make evaluative comments on the factors they have identified, making clear their
importance given the context of the event or development.
E. Organising the information to address the question or issue
Candidates can be credited in a number of ways up to a maximum of 3 marks. They may take
different approaches to organising their findings.
Candidates should structure their answers to show a degree of balance in their assessment of the
factors.
F. Coming to a conclusion about the question
Candidates can be credited in a number of ways up to a maximum of 3 marks. Candidates may

come to a conclusion at the end of their assignment or may provide a series of conclusions.
G. Supporting a conclusion with reasons and/or evidence
Candidates can be credited in a number of ways up to a maximum of 3 marks. Reasons given

should relate to the evidence presented.
Reference:
https://www.understandingstandards.org.uk/National5_images/History/Assignment/
General_Marking_Instructions_Assign.pdf
Q.4 Discuss the latest trends in classroom testing in global and local context. How
instruction can play an important role to test the weaknesses among students.
For many, the word “assessment” translates into multiple choice questions or writing for hours in
a crowded exam hall – it is something very defined and has a certain place in our education or
career. The huge advancements in computer-based testing are now redefining the possibilities of
assessment, particularly in terms of what can be tested, how and when. These advancements
mean that there are many more applications for both summative and formative testing,
applications that even a couple of years ago would not have been possible.
Based on working with a wide and varied client base, here are the top five trends we've identified
that are changing how assessment is delivered:
 1. Movement away from traditional assessment delivery methods.
 2. The end of the road for pen and paper.
 3. Much more engaging and effective assessment.
 4. Increasing levels of automation.
 5. Assessments are much more candidate centric.
These trends have a wide-ranging impact on many different organisations, including

corporations, professional membership bodies, educational institutes, training companies and
government departments.
1. Movement away from traditional assessment delivery methods
The use of professional remote invigilation, which recreates the exam hall experience in an
online environment, means there is a move away from the use of traditional assessment delivery
methods, such as running exams in a test centre. Remote invigilation (also known as online
proctoring) means that a secure exam can be run from any location as long as there is an internet
connection. This gives a great deal of flexibility to candidates, who can sit their exam at a time
and place that suits them, rather than spend time and incur costs associated with taking time off
and travelling to a test centre.
Live remote invigilation happens in real-time. This means that for the duration of an exam, an
invigilator watches the candidate using video, audio and remote screenshare. The session is
recorded and can be reviewed at a later stage if required. Any infringements can be raised as they
happen e.g. if the candidate keeps looking away from the screen, the candidate will be advised to
stop this behaviour. If infringements are severe e.g. the candidate takes a phone call or someone
else comes into the room, the exam may be immediately stopped.
For organisations, the benefits of remote invigilation are numerous, such as a significantly

reduced administration overhead, greater security and the ability to cater for candidates in any
country worldwide. Exams can also be offered with greater frequency, so instead of one long test
available once or twice a year, there may be multiple shorter tests run closer to the period of
tuition.
The use of remote invigilation really is a game changer. To give an example, at TestReach there
was a candidate who was stuck in traffic and unable to get home in time to log in for his exam.
With the permission of the examining body, he pulled his car over to the side of the road and sat
his invigilated exam from his car, using the hotspot on his phone to connect to the internet.
Suffice to say, he passed his exam and gained his diploma! It’s a long way from a cold exam hall
and writing with pen and paper for 3 hours…
2. The end of the road for pen and paper
That brings us to another big change in the world of assessment, and that is the move away from
using pen and paper as an exam delivery method.
Using pen and paper gives rise to many issues around administration and security, some of which
are outlined below:
 There is a huge administration burden with printing, transporting, marking and storing
papers.
 There are security issues with the transportation of papers and managing who has access to
them.
 There is a lack of real-time visibility and reporting.

 It is challenging to manage the paper flow – printing papers, collecting finished scripts,
sending them to markers, storage etc.
 It is easier for errors to be made in data entry and reporting.
 There is a much longer lead time to results.
 People are much less used to writing nowadays, as everyone works on keyboards and
screens – most professional candidates find it very difficult to write for hours.
 Writing with a pen requires a completely different approach to the way most people actually
work. We are now used to jotting down initial thoughts and then editing them until we are
happy with the final result. This kind of editing is not possible on paper, without lots of
crossings out – you have to think about and plan what you are going to write, before putting
pen to paper, which is a very different approach and can put candidates at a disadvantage.
 Very often working with pen and paper is not in line with how we actually carry out our
work in a day-to-day environment. No accountant prepares a set of accounts on paper, it is
all done via spreadsheets, so why should they be penalised during an exam by having to use
a pen and paper.
For organisations who are running exams on paper, it is possible to move to online delivery in
phases, so it doesn't have to be a "big bang", high risk approach. For example, you may create
and manage the question bank online, but print paper-based exam papers. The paper scripts can
then be scanned and marked automatically or sent to relevant examiners where manual marking
is required. On-screen marking has developed dramatically in recent years with a wide range of
online options available - you can read more about on-screen marking options here.
3. Much more engaging and effective assessment
Another key trend has been the move towards the creation of a much more engaging and
effective assessment. Organisations no longer have to use only simple, one-dimensional multiple
choice and essay questions. With the move to online there is now a huge range of question types
available, which help to make assessments much more immersive.
Using a variety of question types gives greater insight into what people know and how they
apply that knowledge in practice. Multi-media options allow the use of videos, photographs,
audio playback, graphs, labelling, drag & drop and many others.
Having a flat pen and paper test allows very little flexibility, it’s typically a one size fits all
approach. Candidates can be offered some choices, such as answer 3 questions out of 5, or if you
are taking the advanced paper, move on to section 5, but these choices are typically extremely
limited. Moving to computer-based assessment enables testing to be much more adaptive to meet
this specific needs of the individual. At its simplest level this could be branching logic, so if a
candidate selects an option indicating that they have specialised in topic A, then they are asked
questions about topic A; candidates selecting topic B are asked about topic B, etc. There are also
more comprehensive levels of adaptive testing, where question sets are tailored based on how
candidates have answered previous questions, or based on which questions they have answered
incorrectly, etc. This can be of use in situations where perhaps a candidate is performing very
poorly in questions on a particular area. In this case the candidate might be asked simpler
questions on that area, or alternative questions about other areas to at least give them a chance to
show the level of knowledge they have.
4. Increasing levels of automation
Not surprisingly there is a significant trend to increase the levels of assessment automation and
there are a number of ways that this is being done.
Many organisations are moving to a “LOFT” Model, which stands for linear-on-the-fly testing.
With this model organisations can set up question banks where each question can be associated
with different tags – these tags might be learning outcomes, syllabus topics, difficulty rating or
job roles, and then automatic picking rules can be defined for each exam paper. This means that
randomised papers can be generated for each candidate. Each test uses different questions but
they are an equivalent in terms of what they are assessing. The big positive is that papers are
generated automatically, so once the exam bank has been set up, it is just a case of maintaining
it, as opposed to having to continually generate new exam papers.
The LOFT model also reduces the ability for candidates to attempt to collaborate on questions or
share exam content online, as everyone is getting a different set of questions.
Because many questions can be auto-scored, results can be issued very quickly with clear
feedback on learning outcomes and areas for improvement.
Another area where there are increasing levels of automation is systems integration. Typically
there are integration points so that candidate details can automatically flow from a Learning
Management System (LMS), a Customer Relationship Management (CRM) system or a HR
system directly into the assessment solution. Candidates then get automatically enrolled in the
correct exam. There may be various other integration points, such as to push results from the
assessment system into a system of record after the exam.
There are also a lot of advancements being made in the area of automated marking, where it’s
now possible to auto-score a wide range of question types that can include text answers.
5. Assessments are much more candidate centric
At TestReach we have a wide range of customers from many different backgrounds. We work
with educational institutes, professional bodies, governments, corporations – and one thing they
all have in common is placing a huge emphasis on the quality of the candidate experience.
It is essential that information and examinations are presented in a user-friendly way. Things like
interactive canvasses are useful in this context, where case studies and questions can be
presented together on-screen, with the option to make notes, highlight and annotate. There are
also options for the candidate to configure the view in line with their own preferences. This type
of innovation means that all the features you’d expect from a paper-based exam are readily and
easily available in the online format.
Other ways in which online assessment makes things more candidate-friendly are the speed in
turnaround of results, and to make it easier to provide detailed feedback. Candidates do not want
to just know their grade but also to understand areas in which they were strong and those areas
where they may need to improve. This is becoming a very important area of focus for many
organisations, as this kind of feedback to candidates provides a lot of benefits particularly for
students who fail.
Formative assessments are also becoming more flexible as assignments and continuous
assessments can all be uploaded to the same online assessment platform. This means all a
candidates’ data and results are securely stored in one location, whether formative or summative.
Reference:
https://www.testreach.com/blog-post/the-five-latest-trends-in-assessment.html
Q.5 Explain some situations in which arithmetic mean of scores is not recommended to
interpret performance or results.
Arithmetic average, or arithmetic mean, or just mean, is probably the simplest tool in statistics,
designed to measure central tendency in a data set (which can be a group of stocks or returns of a
stock in particular years). Using arithmetic average has advantages and disadvantages, and in
some cases you may find other measures (like geometric average or median) more suitable.
When to Use It and When Not
Arithmetic Average Has Strengths and Weaknesses
Arithmetic average is the most popular measure of central tendency and (the reason for its
popularity is that) it is the easiest one to calculate. However, like every other statistical measure,
arithmetic average has strengths and weaknesses and it is more suitable in some situations than
in others. The following is a summary of situations when using arithmetic average is appropriate
and those when arithmetic average should be replaced or complemented by other measures.
When to Use Arithmetic Average
When you work with independent data, for example performance of multiple stocks or
investments in a single period of time (otherwise geometric average may be better).
When all items in the data set are equally important (otherwise use weighted average).
When you need a quick, easy, and rough information about the overall level. Arithmetic average
is the easiest one to calculate.
When you don’t have a computer at hand. Arithmetic average is much easier to calculate in your
head or on paper compared to geometric, harmonic, or weighted average.
When Not to Use Arithmetic Average (Alone)
When the data set contains extreme values. Extreme values can bias arithmetic average to value
which is not representative of the real central tendency in a data set. Median addresses this issue
better.
When errors can be expected in the input data. Sometimes you can have errors or missing data in
your data set. For example when you download stock market data from a trading platform, some
pieces of data can be missing. Some programs automatically set price or performance of a
missing item to zero, which could taint your arithmetic average calculation. Before using
arithmetic average (or any statistical tool), make sure your inputs are correct.
When data set is very volatile or dispersed. There is nothing wrong with using arithmetic average
for dispersed data sets. In fact, replacing arithmetic average with some other measure like
geometric average will not solve all your problems here. It is always a good idea to add other
statistical measures to your analysis to check the volatility or skewness in the data set (arithmetic
average and other measures of central tendency can’t identify and describe such characteristics).
When you work with percentage returns over multiple time periods, especially when the returns
are volatile. In this case, the basis for the percentages is very likely to differ significantly from
period to period and arithmetic average is quite useless. Geometric average is better here. See
reasons and example here: Why arithmetic average fails to measure average percentage returns
over time.
When individual items have different weights or different importance in the data set. Arithmetic
average assigns equal weight to all items. When you need to reflect different weights (for
example in a portfolio of stocks or a stock index), weighted average is more useful.
Reference:
https://www.macroption.com/arithmetic-average-advantages-disadvantages/
https://www.macroption.com/arithmetic-average-when-use/

6507 Assignment 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6507 Assignment 2

Uploaded by

Copyright:

Available Formats

Course: Educational Mesurement and Evaluation (6507) Semester: Spring,

2021 Level: MA /M. Ed.

3. Time Taken for a Response to be Completed:

6. Number of Trials Required:

7. Number of Correct Responses and Wrong Responses:

Relative vs. Absolute Scores:

Reliability and Validity:

Stimulus Measures and Response Measures:

 Item discrimination: A measure of whether an item discriminated between students who

 Effectiveness of alternatives: Determination of whether distractors (incorrect but plausible

The Test Development Process

Specify the purpose of the test and the inferences to be drawn.

Develop frameworks describing the knowledge and skills to be tested.

Build test specifications.

Create potential test items and scoring rubrics.

Review and pilot test items.

Evaluate the quality of items.

 Assemble test forms.

 If needed, set cutscores defining pass/fail or proficiency categories.

General Marking Principles for National 5 History assignment

A. Explaining different factors contributing to the impact or causes of an event or

Candidates can be credited in a number of ways up to a maximum of 2 marks. Candidates

Candidates can be credited in a number of ways up to a maximum of 2 marks. They may

Using other information, to support a factor

Candidates can be credited in a number of ways up to a maximum of 3 marks. Candidates may

Candidates can be credited in a number of ways up to a maximum of 3 marks. Reasons given

 1. Movement away from traditional assessment delivery methods.

 2. The end of the road for pen and paper.

 3. Much more engaging and effective assessment.

 4. Increasing levels of automation.

 5. Assessments are much more candidate centric.

These trends have a wide-ranging impact on many different organisations, including

1. Movement away from traditional assessment delivery methods

For organisations, the benefits of remote invigilation are numerous, such as a significantly

2. The end of the road for pen and paper

 There is a lack of real-time visibility and reporting.

 It is easier for errors to be made in data entry and reporting.

 There is a much longer lead time to results.

3. Much more engaging and effective assessment

4. Increasing levels of automation

5. Assessments are much more candidate centric

When to Use It and When Not

Arithmetic Average Has Strengths and Weaknesses

When to Use Arithmetic Average

When Not to Use Arithmetic Average (Alone)

You might also like