Item Analysis and Reliabilty Report

Item Analysis
What is Item Analysis?
-Item analysis is a method that is used in education to evaluate test item. This can ensure that questions
are in appropriate standard and measure the effectiveness of individual test item. Item analysis is
purposed to improve test items and identify unfair or biased item.
Types of item analysis:
Quantitative Item analysis
- Quantitative item analysis meant to know the quantitative statics as regards to the test items
and item concerned and is based on the following three numerical indicators which are: (Item
difficulty index. Item discrimination index. Distractor power.)
Difficulty Index
- Refers to the proportion of the of the number of students in the upper and lower groups who
answered an item correctly and ranges from 0.0 – 1.0. (Determine who are only correct and the
difficulty of the test)
- Formula in solving Index Difficulty
Ru – Number of students in the upper group who answer the items correctly.
Ri – Number of students from the lower group who answered the items correctly.
N – Total number of students who attempted to answer the items.
Discrimination Index
- Refers to the power of the item to discriminate the students between those who scored high
and low in the over all text. (Determine who are correct yet to identify whether the items were
effective)
- Types of discrimination Index: Positive Discrimination, Negative Discrimination, Zero
Discrimination
- Positive Discrimination- Happens when more students in the upper group got the items
correctly than those students in the lower group
- Negative Discrimination - Occurs when more students in the lower group got the items
correctly than the upper group.
- Zero discrimination -is when a number of students in the upper group and lower group who
answers the test correctly are equal.
- Formula in solving Discrimination Index
Ru – Number of students in the upper group who answer the items correctly.
Ri – Number of students from the lower group who answered the items correctly.
N – Total number of students who attempted to answer the items.
Distractor analysis
-term used for incorrect options in the multiple-choice types of tests while the correct answer
represents the key.
-The distractor analysis is an important component of an item, as they show a relationship between the
total test score and the distractor chosen by the student. Distractor efficiency is one such tool that tells
whether the item was well constructed or fail to perform its purpose. It allows us to examine how many
students in the upper-class group and the lower-class group selected each option on a multiple-choice
item.
- take note: Any distractor that has been selected by less than 5% of the student is considered to be a
non-functional distractor (NF-D).
Qualitative Item analysis

-Is a process in which the teacher or expert carefully proofreads the test before it is administered, to
check if are typographical errors, to avoid grammatical clues that may lead to giving away the correct
answer, and to ensure that the level of reading materials is appropriate. (zurawski, R.M)
- Qualitative Item Analysis According to Cohen, Swerdlik, and Smith (1992) as cited by Zurawski,
students who took the examination are asked to express verbally their experience in answering each
item in the examination.
-How to Improve the Test Items Consider the following examples in analyzing the test item and some
notes on how to improve the item based from the results of item analysis.
Factors why students failed to get the correct answer in the given question:
• It is not taught in the class properly.

• It is ambiguous.
• The correct answer is not in the given in the options.
• It has more than one correct answer.
• It contains grammatical clues to mislead the students.
• The student is not aware of the content.
• The students were confused by the logic of the question because it has double negatives.
• The student failed to study the lesson.
Mis keyed Item
- The test item is a potential mis key if there are more students from the upper group who choose
the incorrect options than the key.
Guessing item
- Students from the upper group have equal spread of choices among the given alternatives. The
students from the upper group guess their answers because of the following reasons:
• the content of the test is not discussed in the class or in the test.
• The item is very difficult.
• The question is trivial.
Ambiguous item
- This happens when more students from the upper group choose equally an incorrect option
and the keyed answer.
Reliability
-Is the consistency of the responses to measure under three conditions:
 When rested on the same person.

 When rested on the same measure.
 similarity of responses across the items that are measure the same characteristic.
There are different factors that affect the reliability of a measure. The reliability of a measure can be
high or low, depending on the following factors:
1. The number of items in a test
-The more the items a test has, the likelihood of reliability is high. The probability of obtaining consistent
scores is high because of the large pool items.
2. Individual differences of participants
-Every participant possesses characteristics that affects their performance in a test, such as fatigue,
concentration, innate ability, perseverance, and motivation. These individual factors change over time
and affect the consistency of the answers in the test.
3. External environment
- it includes room temperature, noise level, depth of instruction, exposure to materials, and quality of
instruction, which could affect the changes in the responses of examinees in a test.
What are the different ways to establish test reliability?
There are different ways in determining the reliability of a test. The specific kind of reliability of
reliability will depend on (1) Variable that you are measuring, (2) type of test, and (3) number of versions
of the test.
-Methods in testing reliability and how to execute it.
Methods in Testing reliability How to execute the reliability? Statistics used
1. Test-retest You have a test and you need to Correlate the test scores from
administer it at one time to a the first and the next
group of examinees. Administer Administration. Significant and
it again at another time to the positive correlation indicates
''same group'' of examinees. that the test has temporal
There is a time interval of not stability over time. Correlation
more than 6 months between refers to a statistical procedure
the first and second where linear relationship is
administration of the tests that expected for two variables. You
measure the stable may use Pearson product
characteristics, such as the moment correlation or Pearson r
standardized aptitude test. The because the test data are usually
post test can be given with a in an interval scale (refer to a
minimum time interval of 30 statistics book for Pearson r).
minutes. The responses in the
test should more or less be the
same across the two points in
time.
test-retest is applicable for test

that measures stable variables,
such as aptitude and
psychomotor measures (e.g.,
typing test, task and physical
education)
2. Parallel Forms There are two version of tests. Correlate the test results for the
The items need to exactly first form and the second form.
measure the same skill. Each Significant and positive
test version is called a “form.” correlation coefficient are
Administer one form at one time expected. The significant and
and the other form to another positive correlation indicates
time to the “same” group of that the responses in the two
participants. The responses on forms are the same r consistent.
the two forms should be more Pearson r is usually used for this
or less the same. analysis.
Parallel forms are applicable if

there are two versions of the
test. This usually done when the
test is repeatedly used for
different groups, such as
entrance examinations and
licensure examinations.
Different versions of the test are
given to different group of
examinees.
3. Split-Half Administer a test to a group of Correlate the two sets of scores
examinees. The items need to using a Pearson r. After the
be split into halves, usually using correlation, used another
the odd-even technique. In this formula called Spearman-Brown
technique, get the sum of the Coefficient obtain using Pearson
points in the odd numbered r and Spearman Brown should
items and correlate it within the be significant and positive to
sum of points of the even mean that the test has internal
numbered items. Each examinee consistency reliability.
will have two scores on each set
should be close or consistent.
Split half is applicable when the

test has a large number of items.
4. Test of Internal This procedure involves A statistical analysis called
Consistency using determining if the scores for Cronbach’s alpha or Kuder
Kuder-Richardson and each item are consistently Richardson is used to determine
Cronbach’s Alpha answered by the examinees. the internal consistency of the
Method After administering the test to a items. A Cronbach’s alpha value
group of examinees, it is is 0.60 and above indicated that
necessary to determine and the test items have internal
record the scores of each item. consistency.
The idea here is to see if the
responses per item are
consistent with each other.
This technique will work well

when the assessment tool has a
large number of items. It is also
applicable for scale and
inventories (e.g., Likert scale
from “strongly agree” to
“Strongly disagree”)
5. Inter-rather Reliability This procedure is used to A statistical analysis called
determine the consistency of Kendall’s tau coefficient of
multiple raters when using concordance is used to
rating scales and rubrics to judge determine if the ratings
a performance. The reliability provided multiple raters agree
here is refers to the similar or with each other. Significant
consistent ratings provided by Kendall’s tau value indicates that
more than one rater or judge the raters concur or agree with
when they use an assessment each other in their ratings.
tool.
Inter-rater is applicable when

the assessment requires the use
of multiple raters.
1. Linear Regression
- It is demonstrated when you have two variables that are measured, such as two sets of scores
taken at two different times by the same participants. When the two scores are plotted in a
graph (with x-axis and Y-axis), they tend to form a straight line. The straight line formed from
the sets of scores can produce a linear regression. When a straight line is formed, we can tell
that there is correlation between the two sets of scores. This can be seen in the graph shown.
The correlation is shown in the graph given. The graph is called a scatterplot. Each point in the
scatterplot is correspondent with two score (one for each test.)
2. Computation of Pearson r correlation

- The index of the linear regression is called a correlation coefficient. When the points in a
scatterplot tend to fall within the linear line, the correlation is said to be strong. When the
direction of the scatterplot is directly proportional, the correlation coefficient will have a
positive value. If the line is inverse, the correlation coefficient will have a negative value.
The statistical analysis used to determine the correlation coefficient is called the Pearson r.
- Formula in solving Pearson r
Example:
Suppose that a teacher gave spelling of two-syllable words with 20 items for Monday and
Tuesday. The teacher wanted to determine the reliability of two set of scores by computing for
the Pearson r.
3.Difference between positive and negative correlation
-when the value of the correlation coefficient is positive, it means that the higher scores in X, the higher
score in y, in the case of two spelling scores this is called a positive correlation. In the case of the two
spelling scores, a positive correlation is obtained. When the value of the correlation coefficient is
negative, it means that the higher the scores in x axis, the lower the score in the y axis, and vise versa.
This is called a negative correlation. When the same test is administered to the same group of
participants, usually a positive correlation indicates reliability or consistency of the scores.
4. Determine the strength of a correlation
-The strength of the correlation also indicates the strength of the reliability of the test. This indicated by
the value of the correlation coefficient. The closer the value to 1.00 or -1.00, the strongest is the
correlation.
Value Meaning
0.80-1.00 Very strong relationship
0.6-0.79 Strong relationship
0.40-0.59 Substantial/marked relationship
0.2-0.39 Weak relationship
0.00-0.19 Negligible relationship
5.Detremining the significance of the correlation
The correlation obtained between two variables could be due to chances. In order to determine if the
correlation is free of certain errors, it is tested with significance. When a correlation is significant, it
means that the probability of the two variables being related is free of certain errors.
In order to determine if a correlation coefficient value is significant, it is compared with an expected

probability of correlation coefficient value called critical value. When the value computed is grater than
the critical value, it means that the information obtained has more than 95% chance of being correlated
and is significant.
Example:
A Kendall’s w coefficient value of 0.38 indicates the agreement of the raters in the 5 demonstrations.
There is moderate concordance among the three raters because the value is far from 1.00.
-fin-
Item analysis:https://thejoyoflanguageassessment.wordpress.com/2012/12/19/item-analysis-3/
#:~:text=Item%20analysis%20is%20a%20method%20that%20is%20used,test%20items%20and
%20identify%20unfair%20or%20biased%20item.
Quantitative item analysis: https://www.slideshare.net/amjadfrance/quantitative-item-analysis
Qualitative Item analysis: https://www.slideshare.net/benluc34/qualitative-item-analysis

- :~:text=Qualitative%20Item%20Analysis%20Is%20a%20process%20in%20which,that%20the%20level
%20of%20reading%20materials%20is%20appropriate.
Distractor Analysis: https://youtu.be/87-OKiMhp3s
Item Analysis video: https://www.youtube.com/watch?app=desktop&v=jzSsMacat-k
Item Analysis: https://www.slideshare.net/iamnotangelica/item-analysis-30373947
How to compute Pearson r: https://www.wallstreetmojo.com/pearson-correlation-coefficient/
Methods in testing reliability: https://youtu.be/U2sM3QQwFec
Reliability test videos/guides: https://youtube.com/playlist?list=PLL3KEsFFItmRXhcv8sVTxUk1ecXiLN5TJ

Item Analysis and Reliabilty Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Item Analysis and Reliabilty Report

Uploaded by

Copyright:

Available Formats

Item Analysis

What is Item Analysis?

Types of item analysis:

Quantitative Item analysis

N – Total number of students who attempted to answer the items.

N – Total number of students who attempted to answer the items.

Qualitative Item analysis

• It is not taught in the class properly.

-Is the consistency of the responses to measure under three conditions:

 When rested on the same person.

1. The number of items in a test

What are the different ways to establish test reliability?

-Methods in testing reliability and how to execute it.

Methods in Testing reliability How to execute the reliability? Statistics used

test-retest is applicable for test

Parallel forms are applicable if

Split half is applicable when the

This technique will work well

Inter-rater is applicable when

2. Computation of Pearson r correlation

3.Difference between positive and negative correlation

4. Determine the strength of a correlation

5.Detremining the significance of the correlation

In order to determine if a correlation coefficient value is significant, it is compared with an expected

Quantitative item analysis: https://www.slideshare.net/amjadfrance/quantitative-item-analysis

Qualitative Item analysis: https://www.slideshare.net/benluc34/qualitative-item-analysis

Distractor Analysis: https://youtu.be/87-OKiMhp3s

Item Analysis video: https://www.youtube.com/watch?app=desktop&v=jzSsMacat-k

Item Analysis: https://www.slideshare.net/iamnotangelica/item-analysis-30373947

How to compute Pearson r: https://www.wallstreetmojo.com/pearson-correlation-coefficient/

Methods in testing reliability: https://youtu.be/U2sM3QQwFec

Reliability test videos/guides: https://youtube.com/playlist?list=PLL3KEsFFItmRXhcv8sVTxUk1ecXiLN5TJ

You might also like