You are on page 1of 2

Last session, we discussed about the validity and reliability of research instrument.

It is important to
consider the validity and reliability of our methods and measurements in order to avoid errors and
biases on the results of our qualitative research.

Validity indicates the accuracy of our instrument in measuring what we intend to measure. If an
instrument measures what it's supposed to measure and the findings match real-world values, it's
considered valid. There are four types of validity: Construct Validity, Content Validity, Face Validity,
and Criterion Validity.

Construct validity is about making sure that the method of measurement is appropriate for the
construct we want to measure. For example, if we wish to create a questionnaire to assess levels of
anxiety and depression, we must ensure that the questionnaire can accurately assess the anxiety and
depression construct. The respondents' emotions, sentiments, or some other attribute must be
measured by the questionnaire. We must build measures and indicators based on relevant current
information to attain construct validity.

Other type of validity is the content validity. It makes sure that the scope of instrument covers all
aspect of the construct. An instrument must measure all important aspects of the subject it is
measuring. The validity of the instrument is jeopardized if certain elements are missing or
unnecessary components are included. For example, a science teacher develops a questionnaire for the
final exam. The exam’s scope must include all the subjects she tackled from the beginning of the
semester. If she didn't cover some of the things she did, the findings might not be representative of
the students' comprehension of her courses. The same outcome will happen if she includes topics that
she did not discuss with her students. Results are no longer valid measure of their knowledge of her
subject.

Face validity on the other hand, considers how suitable the content of an instrument seems to be
included. It is almost the same with content validity but it is more informal and it has subjective
approach on assessment.

The last type of validity is criterion validity. It can be identified by calculating correlation between
results of criterion measurement and the results of measurement. Example of measuring criterion
validity is by administering two sets tests. One set must be an existing valid instrument while the
other one is the instrument whose criterion validity will be measured. If the results have high
correlation, then the instrument has high criterion validity.

In short, validity is all about the quality of instruments. It is mainly concerned about the format and
content of the questionnaire or the research instrument.

Reliability on the other hand, is all about the consistency and stability of the results. It is mainly
concerned with the repeatability. The degree to which the results would repeat themselves across
multiple trials is referred to as reliability. There are also four types of reliability: Test-retest reliability,
Parallel forms reliability, Interrater reliability, and Internal Consistency.

Test-retest reliability is described as the consistency of results over time. It can be measured by
administering 2 sets of exactly identical tests, on the same sample, at a different point of time. The
results of two tests will be then correlated. The higher the correlation between two sets of results, the
higher the test-retest reliability of the instrument. Test-retest reliability can be used to assess how well
the developed instrument resists factors that might affect the results of a test such as moods and
internal conditions of the respondents. This is usually used when you measure something that you
expect to remain constant over a period of time. Examples of situations where you can apply test-
retest reliability is IQ tests and colorblindness test as those traits are not expected to change over a
period of time.

Parallel forms reliability or alternate forms reliability is the consistency of equivalent versions of a
test. It mainly used on educational assessments like pre-tests and post-tests. Like test-retest reliability,
it is also measured by administering two sets of tests. The difference is that in parallel forms
reliability, the second set must be a revised but equivalent version of the first test. This is to avoid the
respondents from answering the test by memory. High correlation between the two results means high
parallel forms reliability of the instrument.

Interrater Reliability is the consistency of agreement among people. It is also called interobserver
reliability. Different responders will make the same measurement or observation of the same sample
to determine the instrument's interrater reliability. Then you calculate correlation of their responses.
High similarities of their answers or responds means high interrater reliability of instrument. This is
mainly used in observational studies on which the measurement relies on perceptions of the
phenomena or situation.

The most commonly used type of reliability is the internal consistency. This is used to measure
correlation between multiple items in a test that measures the same construct. Most commonly used
tests to measure internal consistency are Cronbach’s Alpha Coefficient and the Kuder-Richardson 20
(KR-20). Cronbach’s alpha coefficient is used when the instrument uses multiple scale questions like
Likert scale while KR-20 is used on tests where answers is only either “Yes” or “No”, “Right” or
“Wrong”, and “True” or “False”.

To summarize, Test-retest reliability is used when measuring something that is expected to remain
constant over time. Parallel forms reliability is used when there are two different instruments to
measure the same concept or subject. Interrater reliability can be used when multiple observers are
rating or assessing the same subject. And lastly, internal consistency is used when using a multi-item
instrument where all items are intended to measure the same variable.

You might also like