How Are Reliability and Validity Assessed

• How are reliability and validity assessed?
• Reliability can be estimated by comparing different versions of the

same measurement.
• Validity is harder to assess, but it can be estimated by comparing the
results to other relevant data or theory.
• Methods of estimating reliability and validity are usually split up into
different types.
• How do you assess reliability in research?
• 1. Test-retest reliability
• The test-retest reliability method in research measures the consistency
of results when you repeat the same test on the same sample at a
different point in time
• If the results of the test are similar each time you give it to the sample
group, that shows your research method is likely reliable and not
influenced by external factors, like the sample group's mood or the
day of the week.
• Example: Give a group of college students a survey about their
satisfaction with their school's parking lots on Monday and again on
Friday, then compare the results to check the test-retest reliability.
• 2. Parallel forms reliability
• This strategy involves giving the same group of people multiple types of tests
to determine if the results stay the same when using different research
methods.
• If they do, this means the methods are likely reliable because, otherwise, the
participants in the sample group may behave differently and change the
results.
• For this strategy to succeed, it's important that:
• Each research method is looking for the same information
• The group of participants behave similarly for each test
• Example: In marketing, you may interview customers about a new product,
observe them using the product and give them a survey about how easy the
product is to use and compare these results as a parallel forms reliability test.
• 3. Inter-rater reliability
• The inter-rater reliability testing involves multiple researchers assessing a sample
group and comparing their results.
• If most of the results from different assessors are similar, it's likely the research
method is reliable and can produce usable research because the assessors
gathered the same data from the group.
• This is useful for research methods where each assessor may have different
criteria but can still end up with similar research results, like:
• Observations
• Interviews
• Surveys
• Example: Multiple behavioral specialists may observe a group of children playing to
determine their social and emotional development and then compare notes to check
for inter-rater reliability.
• 4. Internal consistency reliability
• Checking for internal consistency in research involves making sure your internal
research methods or parts of research methods deliver the same results.
• There are two typical ways to make this determination:
• Split-half reliability test: You can perform this test by splitting a research method, like a
survey or test, in half, delivering both halves separately to a sample group, then comparing
the results to ensure the method can produce consistent results. If the results are consistent,
then the results of the research method are likely reliable.
• Inter-item reliability test: With this assessment, you administer sample groups multiple
testing items, like with parallel forms reliability testing, and calculate the correlation
between the results of each of the method results. With this information, you calculate the
average and use the number to determine if the results are reliable.
• Example: You may give a company's cleaning department a questionnaire about
which cleaning products work the best, but you split it in half and give each half to the
department separately and calculate the correlation to test for split-half reliability.
VALIDITY
1. Construct
• The adherence of a measure to existing theory and knowledge of the concept being
measured.
• There are two main types of construct validity.
• Convergent validity: The extent to which your measure corresponds to measures of
related constructs
• Discriminant validity: The extent to which your measure is unrelated or negatively
related to measures of distinct constructs
• A self-esteem questionnaire could be assessed by measuring other traits known or
assumed to be related to the concept of self-esteem (such as social skills and
optimism). Strong correlation between the scores for self-esteem and associated
traits would indicate high construct validity.
2. Content
• The extent to which the measurement covers all aspects of the concept being measured.
• To produce valid results, the content of a test, survey or measurement method must cover all
relevant parts of the subject it aims to measure.
• If some aspects are missing from the measurement (or if irrelevant aspects are included), the
validity is threatened.
• content validity applies to any context where you create a test or questionnaire for a
particular construct and want to ensure that the questions actually measure what you intend
them to.
• A test that aims to measure a class of students’ level of Spanish contains reading, writing and
speaking components, but no listening component. Experts agree that listening
comprehension is an essential aspect of language ability, so the test lacks content validity for
measuring the overall level of ability in Spanish.
3. Face validity
• Face validity is about whether a test appears to measure what it’s supposed to measure.
• This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing
on the surface.
• Face validity considers how suitable the content of a test seems to be on the surface.
• To assess face validity, you ask other people to review your measurement technique and items and gauge their
suitability for measuring your variable of interest.
• Ask them the following questions:
• Are the components of the measure (e.g., questions) relevant to what’s being measured?
• Does the measurement method seem useful for measuring the variable?
• Is the measure seemingly appropriate for capturing the variable?
• You can create a short questionnaire to send to your test reviewers, or you can informally ask them about whether
the test seems to measure what it’s supposed to.
• You create a survey to measure the regularity of people’s dietary habits. You review the survey items, which ask
questions about every meal of the day and snacks eaten in between for every day of the week. On its surface, the
survey seems like a good representation of what you want to test, so you consider it to have high face validity.
4. Criterion
• The extent to which the result of a measure corresponds to other valid measures of
the same concept.
• Criterion validity evaluates how well a test can predict a concrete outcome, or how
well the results of your test approximate the results of another test.
• To evaluate criterion validity, you calculate the correlation between the results of
your measurement and the results of the criterion measurement.
• If there is a high correlation, this gives a good indication that your test is measuring
what it intends to measure.
• A survey is conducted to measure the political opinions of voters in a region. If the
results accurately predict the later outcome of an election in that region, this
indicates that the survey has high criterion validity.

How Are Reliability and Validity Assessed

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How Are Reliability and Validity Assessed

Uploaded by

Copyright:

Available Formats

• How are reliability and validity assessed?

• Reliability can be estimated by comparing different versions of the

You might also like