Methods in Assessing Reliability

Method Types of Reliability Procedure Statistical Measure
Test-Retest Reliability Measure of Stability Give a test twice to the Pearson r

same group with any
time interval between
sets from several
minutes or several
years.
Equivalent Forms Measure of Give parallel forms of Pearson r
Equivalence test at the same time
between forms
Test-Retest with Measure of stability Give parallel forms of Pearson r
Equivalent Forms and equivalence test with increase time
intervals between
forms
Split-Half Measure of Internal Give a test once. Score Pearson r and
Consistency equivalent halves of Spearman-Brown
the test (e.g. odd and Formula
even numbered items)
Kuder-Richardson Give the test once, Kuderson-Richardson
then correlate the Formula 20 and 21
proportion/percentage
of the student passing
and not passing a given
item
Cronbach Coefficient Give a test once. Then Kuderson-Richardson
Alpha estimate reliability by Formula 20
using the standard
deviation per item and
the standard deviation
per item and the
standard deviation of
the test scores
Reliability Test
Instructions when employing the methods:
1. Test-Retest Reliability: This method assesses the stability of a measure over time.
To administer this method, follow these steps:
a. Select a sample of participants who are representative of the population of
interest.
b. Administer the same test or measurement to the same group of individuals at
two or more different time points, with a time interval in between (e.g., weeks,
months).
c. Collect and record the scores obtained from each participant at each time
point.
d. Calculate the correlation coefficient (e.g., Pearson's correlation coefficient)
between the scores obtained at the two time points. A high correlation coefficient
indicates high test-retest reliability.
2. Equivalent Forms: This method assesses the equivalence of two or more different
forms or versions of the same test or measurement. To administer this method,
follow these steps:
a. Create two or more different forms or versions of the same test or
measurement, ensuring that the content and difficulty level are equivalent.
b. Select a sample of participants who are representative of the population of
interest.
c. Administer two or more different forms of the test or measurement to the
same group of individuals in a counterbalanced manner (i.e., randomly assign
participants to different forms).
d. Collect and record the scores obtained from each participant for each form.
e. Calculate the correlation coefficient (e.g., Pearson's correlation coefficient)
between the scores obtained from different forms. A high correlation coefficient
indicates high equivalent forms reliability.
3. Test-Retest with Equivalent Forms: This method combines the assessment of both
stability and equivalence of a measure. To administer this method, follow these
steps:
a. Create two or more different forms or versions of the same test or
measurement, ensuring that the content and difficulty level are equivalent.
b. Select a sample of participants who are representative of the population of
interest.
c. Administer the same form of the test or measurement to the same group of
individuals at two or more different time points, with a time interval in between
(e.g., weeks, months).
d. Collect and record the scores obtained from each participant at each time
point.
e. Calculate the correlation coefficient (e.g., Pearson's correlation coefficient)
between the scores obtained at the two time points. A high correlation coefficient
indicates high test-retest reliability.
f. Administer two or more different forms of the test or measurement to the same
group of individuals in a counterbalanced manner (i.e., randomly assign
participants to different forms).
g. Collect and record the scores obtained from each participant for each form.
h. Calculate the correlation coefficient (e.g., Pearson's correlation coefficient)
between the scores obtained from different forms. A high correlation coefficient
indicates high equivalent forms reliability.
4. Split-Half: This method assesses the internal consistency of a measure by splitting

the items or sub-scales of the measure into two halves and comparing the scores
obtained from each half. To administer this method, follow these steps:
a. Select a sample of participants who are representative of the population of
interest.
b. Administer the test or measurement to the participants.
c. Collect and record the scores obtained from each participant for each item or
sub-scale.
d. Randomly divide the items or sub-scales of the measure into two halves (e.g.,
odd vs. even numbered items, randomly selected items).
e. Calculate the total scores for each half of the measure.
f. Calculate the correlation coefficient (e.g., Pearson's correlation coefficient)
between the total scores obtained from the two halves. A high correlation
coefficient indicates high split-half reliability.
Types of Reliability
1. Measure of Stability: This type of reliability assesses the consistency or stability

of a measure over time. It involves administering the same test or measurement
to the same group of individuals at two or more different time points and
comparing the results. The assumption is that if the measure is stable or
consistent over time, individuals should obtain similar scores when tested at
different time points. Common statistical measures used to assess stability
reliability include Pearson's correlation coefficient and Intraclass Correlation
Coefficient (ICC). A high correlation or ICC between the scores obtained at
different time points indicates high stability reliability.
2. Measure of Equivalence: This type of reliability assesses the consistency or

equivalence of two or more different forms or versions of the same test or
measurement. It involves administering two or more different forms of the test or
measurement to the same group of individuals and comparing the results. The
assumption is that if the different forms are equivalent, individuals should
obtain similar scores regardless of which form they are administered.
Common statistical measures used to assess equivalence reliability include
Pearson's correlation coefficient and Intraclass Correlation Coefficient (ICC). A
high correlation or ICC between the scores obtained from different forms
indicates high equivalence reliability.
3. Measure of Stability and Equivalence: This type of reliability combines the

assessment of both stability and equivalence of a measure. It involves
administering two or more different forms of the same test or measurement to
the same group of individuals at two or more different time points and
comparing the results. The assumption is that if the measure is both stable and
equivalent, individuals should obtain similar scores when tested with different
forms at different time points. Common statistical measures used to assess
stability and equivalence reliability include Pearson's correlation coefficient and
Intraclass Correlation Coefficient (ICC).
4. Measure of Internal Consistency: This type of reliability assesses the consistency

of responses within a single measurement or data collection session. It involves
administering a questionnaire or scale to a single group of individuals and
calculating measures of internal consistency, which reflect the extent to which
the items or sub-scales within the measure are consistent in measuring the
same construct. Common statistical measures used to assess internal
consistency reliability include Cronbach's alpha, split-half reliability, and
Guttman's lambda. A higher value of these measures indicates higher internal
consistency reliability.
Examples:
Measure of Stability
Let's consider a study that aims to assess the test-retest reliability of a self-report
questionnaire measuring depression symptoms. The questionnaire consists of 20 items,
each rated on a 4-point scale ranging from 1 (Not at all depressed) to 4 (Extremely
depressed). Lower scores on the questionnaire indicate lower levels of depression.
Here's how you could administer and assess the test-retest reliability:
1. Select a sample of 50 participants who are representative of the population of

interest, such as adults with a history of depression.
2. Administer the depression questionnaire to the participants at Time 1 and collect
their responses for each of the 20 items.
3. Calculate the total score for each participant at Time 1 by summing the scores
obtained from all 20 items.
4. After a predetermined time interval (e.g., 2 weeks), re-administer the same
depression questionnaire to the participants at Time 2 and collect their responses
again for each of the 20 items.
5. Calculate the total score for each participant at Time 2 by summing the scores
6. Compute the test-retest reliability by calculating the correlation coefficient (e.g.,
Pearson's r) between the total scores obtained at Time 1 and Time 2. You can use
software such as SPSS or Excel to compute the correlation.
7. Interpret the correlation coefficient. Let's say the computed correlation coefficient
between Time 1 and Time 2 scores is 0.90, which indicates a high positive
correlation. This suggests that the measure has high stability over time, as the
scores obtained at the two time points are consistent with each other.
It's important to note that the time interval between the test and retest should be
carefully considered, as factors such as participant characteristics, intervening events,
and measurement error can influence the test-retest reliability. Additionally, test-retest
reliability may not be appropriate for measures that are expected to change over time,
such as measures of attitudes or behaviors that can be influenced by external factors. In
such cases, other methods such as equivalent forms or internal consistency may be
more appropriate.
Measure of Equivalence
Certainly! Measure of equivalence, also known as parallel forms reliability or equivalent
forms reliability, assesses the consistency of a measure when different but equivalent
versions of the measure are administered to the same group of individuals. It involves
administering two different forms of the same measure to the same group of individuals
and then examining the correlation between the scores obtained from the two forms.
Here's an example:
Let's consider a study that aims to assess the equivalent forms reliability of a language
proficiency test. The test is designed to measure English language proficiency and has
two equivalent forms: Form A and Form B. Each form consists of 50 items, with total
scores ranging from 0 to 50, where higher scores indicate higher levels of English
proficiency.
Here's how you could administer and assess the equivalent forms reliability:

interest, such as non-native English speakers.
2. Administer Form A of the language proficiency test to the participants and collect
their responses for each of the 50 items.
3. Calculate the total score for each participant for Form A by summing the scores
4. After a short time interval (e.g., 1 day or 1 week), administer Form B of the
language proficiency test to the same participants and collect their responses for
each of the 50 items.
5. Calculate the total score for each participant for Form B by summing the scores
6. Compute the equivalent forms reliability by calculating the correlation coefficient
(e.g., Pearson's r) between the total scores obtained from Form A and Form B.
You can use software such as SPSS or Excel to compute the correlation.
7. Interpret the correlation coefficient. Let's say the computed correlation coefficient
between Form A and Form B scores is 0.95, which indicates a high positive
correlation. This suggests that the two forms of the language proficiency test are
equivalent and yield consistent scores when administered to the same group of
individuals.
It's important to note that equivalent forms should be carefully developed and validated
to ensure that they are truly equivalent in terms of content, difficulty, and statistical
properties. Any differences between the two forms, such as changes in item wording,
item ordering, or response format, may affect the equivalent forms reliability.
Additionally, equivalent forms reliability may not be feasible or appropriate for all
measures, as it requires the development of multiple equivalent forms, which can be
time-consuming and resource-intensive. In such cases, other methods such as test-
retest reliability or internal consistency may be more suitable.
Internal Consistency
Let's consider an example of a self-report questionnaire designed to measure the level
of anxiety in a sample of individuals. The questionnaire consists of 10 items, each rated
on a 5-point scale ranging from 1 (Not at all anxious) to 5 (Extremely anxious). Higher
scores on the questionnaire indicate higher levels of anxiety.
Here's how you could administer and assess the internal consistency using Cronbach's
coefficient alpha:

interest, such as individuals with generalized anxiety disorder.
2. Administer the anxiety questionnaire to the participants and collect their
responses for each of the 10 items.
3. Calculate the total score for each participant by summing the scores obtained
from all 10 items.
4. Calculate the inter-item correlations, which are the correlations between each
item and the total score obtained from all 10 items. For example, you can use
software such as SPSS or Excel to compute the correlations.
5. Calculate the average inter-item correlation by summing all the inter-item
correlations and dividing by 10 (the number of items).
6. Calculate the variance of the total scores obtained from all 10 items.
7. Use the following formula to calculate Cronbach's coefficient alpha:
Cronbach's alpha = (10 / (10 - 1)) * (1 - (Average inter-item correlation / Variance
of total scores))
8. Interpret the value of Cronbach's coefficient alpha. Let's say the computed value
of Cronbach's alpha is 0.85. This indicates that the items within the anxiety
questionnaire are highly internally consistent, as the coefficient alpha is close to
1.0 (perfect internal consistency). This suggests that the items in the
questionnaire are measuring the same construct (anxiety) consistently and
reliably.
Measure of stability and equivalence
Let's consider a study that aims to assess the stability and equivalence of a self-report
depression scale. The scale has two equivalent forms: Form A and Form B. Each form
consists of 20 items, with total scores ranging from 0 to 60, where higher scores indicate
higher levels of depression.
Here's how you could administer and assess the stability and equivalence:

interest, such as individuals with a history of depression.
2. Administer Form A of the depression scale to the participants and collect their
4. After a time interval (e.g., 2 weeks or 1 month), administer Form B of the
depression scale to the same participants and collect their responses for each of
the 20 items.
6. Compute the test-retest reliability by calculating the correlation coefficient
between the total scores obtained from Form A and Form B at the two time
points. You can use software such as SPSS or Excel to compute the correlation.
7. Interpret the test-retest reliability coefficient. Let's say the computed correlation
coefficient between Form A and Form B scores at the two time points is 0.85,
which indicates a high positive correlation. This suggests that the depression
scale has good stability over time, as the scores are consistent across the two
time points.
8. Compute the equivalent forms reliability by calculating the correlation coefficient
between the total scores obtained from Form A and Form B at the same time
point. You can use software such as SPSS or Excel to compute the correlation.
9. Interpret the equivalent forms reliability coefficient. Let's say the computed
correlation coefficient between Form A and Form B scores at the same time point
is 0.90, which indicates a high positive correlation. This suggests that Form A and
Form B of the depression scale are equivalent, as the scores are consistent when
administered at the same time point.
10. Based on the results of both the test-retest reliability and equivalent forms
reliability, you can make conclusions about the stability and equivalence of the
depression scale. In this example, the scale has good stability over time and
equivalence between Form A and Form B, indicating that it is reliable for
assessing depression levels in the target population.
It's important to note that measure of stability and equivalence requires careful
consideration and validation of both test-retest reliability and equivalent forms
reliability, as any changes in the measure or administration procedures may affect the
results. Additionally, this method may require more resources and time compared to
other reliability assessment methods, as it involves administering multiple forms of the
measure and collecting data at multiple time points.
Split-Half
Let's consider a study that aims to assess the internal consistency of a self-report anxiety
scale. The scale consists of 30 items, with total scores ranging from 0 to 90, where
higher scores indicate higher levels of anxiety.
Here's how you could administer and assess the split-half reliability:

interest, such as individuals with anxiety disorders.
2. Administer the 30-item anxiety scale to the participants and collect their
3. Randomly divide the 30 items into two halves: Form A and Form B. This can be
done using a random number generator or other randomization methods.
obtained from the items in Form A.
obtained from the items in Form B.
6. Compute the split-half reliability by calculating the correlation coefficient, such as
Pearson's correlation or Spearman's rank correlation, between the total scores
obtained from Form A and Form B. You can use software such as SPSS or Excel to
compute the correlation.
7. Interpret the split-half reliability coefficient. Let's say the computed correlation
coefficient between Form A and Form B scores is 0.80, which indicates a high
positive correlation. This suggests that the anxiety scale has good internal
consistency, as the scores obtained from the two halves of the scale are
consistent with each other.
8. You can also use other statistical measures, such as the Spearman-Brown
prophecy formula or Cronbach's alpha, to estimate the internal consistency of the
scale based on the split-half reliability coefficient.
It's important to note that split-half reliability relies on the assumption that the two
halves of the measure are equivalent and that the items within each half measure the
same construct. Therefore, it's essential to carefully select and randomize the items to
ensure that they are similar in content and difficulty level. Additionally, the sample size
and characteristics of the participants can also influence the reliability coefficient, so it's
important to consider these factors when interpreting the results.
Computation using Excel
Recommended Videos:
https://www.youtube.com/playlist?list=PLL3KEsFFItmRXhcv8sVTxUk1ecXiLN5TJ

Methods in Assessing Reliability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Methods in Assessing Reliability

Uploaded by

Copyright:

Available Formats

Method Types of Reliability Procedure Statistical Measure

Test-Retest Reliability Measure of Stability Give a test twice to the Pearson r

4. Split-Half: This method assesses the internal consistency of a measure by splitting

1. Measure of Stability: This type of reliability assesses the consistency or stability

2. Measure of Equivalence: This type of reliability assesses the consistency or

3. Measure of Stability and Equivalence: This type of reliability combines the

4. Measure of Internal Consistency: This type of reliability assesses the consistency

1. Select a sample of 50 participants who are representative of the population of

1. Select a sample of 100 participants who are representative of the population of

1. Select a sample of 100 participants who are representative of the population of

Measure of stability and equivalence

1. Select a sample of 200 participants who are representative of the population of

1. Select a sample of 300 participants who are representative of the population of

You might also like