You are on page 1of 16

TRIPURA INSTITUTE OF PARAMEDICAL

SCIENCES
(NURSING)

SUBJECT:- RESEARCH AND STATISTICS


ASSIGNMENT on :-
VALIDITY AND RELIABILITY OF RESEARCH TOOL.

Submitted to :- submitted by:

Miss. PIANKHI SAHA Mr SUJIT KUMAR NATH

Associate Professor M.sc Nursing, ( 2ND semester)

Obstetrics and gynecological Nursing Roll no:- 13

TIPS, Nursing TIPS, Nursing

DATE OF SUBMISSION:- 29/04/2024


VALIDITY AND RELIABILITY OF RESEARCH TOOL

 TOOLs of research:- Collection of data is an important research activity, the


necessary direction for data collection is provided by the central problem chosen to conduct
research.

 Characteristics of Research Tool

An ideal measuring instrument is one which results in relevant, accurate, objective, sensitive, and
efficient measures. Physiological or physical measurements have a greater chance of having
these characteristics than psychological measurements. Scales developed to measure these
psychological variables are often imperfect and error prone. A number of techniques are
available to evaluate the quality of these measuring tools to minimize error. These techniques
estimate reliability, validity, objectivity, sensitivity, specificity, and appropriateness. Thus if the
measuring tools are reliable, valid, objective, sensitive, specific and appropriate it will improve
accuracy and enhance scientific quality of the research.
Realibility

Appropiateness Validity

characteristics

specificiity objectivity

sensitivity

 Reliability:- At the heart of all measurements is reliability. It is the degree to which an


assessment tool produces stable and consistent results. Instruments are considered reliable if
they consistently measure a given trait with precision, i.e. the degree of reproducibility or the
generation of consistent values every time an instrument is used.

 The quality and adequacy of quantities data can only be assessed by establishing the
reliability of an instrument. Reliability is the degree of consistency with which the attributes
or variables are measured by an instrument. For example, a blood pressure-measuring
instrument gave a reading of 120 mmHg systolic blood pressure; after some time when blood
pres- sure is again measured for the same subject, it gave a reading of 160 mmHg systolic
blood pressure. In this situation, this instrument is not considered reliable. However, if a
research instrument yields similar or close to similar results on repeated administration of
instrument, it is considered as highly reliable research instrument.

Reliability pertains to the consistency of a measure. A test is thought to be reliable if we get


the same result in a repeated manner. For example, if a test is designed to scale a trait e.g.
introversion), each time the test is put on a subject, the result should be similar. But, it is
unfortunate as it is not possible to calculate the exact reliability; but there are various ways to
measure reliability.
Definitions:-

Reliability is the degree of consistency and accuracy with which an instrument measures the
attribute for which it is designed to measure.

'Reliability is defined as the ability of an instrument to create reproducible results. Therefore,


reliability is concerned with consistency of the measurement tools. A tool only can be
considered reliable if it measures an attribute with similar results on repeated use.

 Meaning:-

Reliability refers to the consistency with which an instrument or test measures what it is
supposed to measure.

Reliability is the extent to which a measurement or instrument or test yields the same results
on repeated administration.

Methods of Testing Reliability

The more reliable a test or instrument, the more researcher can rely on the scores. There are
three methods of testing the reliability of research instrument: Stability, equivalence, internal
consistency.

Methods of
test reliability

internal
stability equivalence
consistency

Test- retest repeated Alternate inter-rater Split-half


method method form reliability method
o Stability

A stable research instrument is one which when repeated over and over on the same research
subject will produce the same research results. Stability of the instrument can be evaluated by
test-retest method and repeated observations.

Test-retest method: Repeated measurements over time using the same instrument on same
subjects is expected to produce the same results. In this method the researcher administers
the same test twice over a period of time to a group of individuals. The scores from time 1
and time 2 can then be correlated in order to evaluate the stability of the test. If the test or
instrument is reliable, individual scores will be very similar at both tests .

Tool/Instrument

Administration to same
set of subjects

Time Time
point -1 point- 2

If the scores from time 1 and time 2 are similar


the tool is considered as stable

Fig:- Assessing stability of the instrument.

For example, a test is developed to measure the knowledge of psychopharmacology among


nursing students. The test is given to a group of nursing students and repeated 2 weeks later.
Assuming that the students have had no additional classes regarding the topic during the 2-week
period between the tests, results from the first testing can be correlated with the second testing.
The obtained correlated coefficient would indicate the stability of the test. Karl Pearson's
correlation coefficient is used to estimate reliability. Reliability coefficient ranges from 0.00 to
1.00 with higher coefficients indicating higher levels of reliability. A completely reliable test has
a reliability coefficient of 1.00 and a completely unreliable test has a reliability coefficient 0.00.
A score above 0.80 indicates an acceptable level of reliability of the tool.

 A concern with this type of reliability is how to determine the amount of time that should
elapse between the two testing. Knapp and Brown say that test-retest reliabilities are
generally higher when the time lapse between the testing is short, usually no longer than 4
weeks. If the intervals are too short, memory of the responses given during the first session
may influence responses during the second session.

 A reliable questionnaire will give consistent results overtime. If the results are not consistent
the test is not considered reliable and will need to be revised until it does measure
consistently.

 Test retest is used primarily with questionnaires but the concept of repeated measurement to
establish stability can also be used with tools like thermometer and hemodynamic monitors
and in instances where the variable being measured is not expected to fluctuate.

 The main limitation is that the test for stability can be performed only when the measuring
trait remains constant over time. An example for a stable concept is intelligence. It should be
possible to measure intelligence repeatedly at regular intervals and obtain the same score. An
unstable concept such as pain is changeable and subject to frequent fluctuations even for a
person with chronic pain. Repeated measures of pain in a subject would result in widely
different scores. These differences would not mean that the instrument is unstable but rather
that the individual's pain was changing (variable being measured was changing).

Repeated observations: When using observational methods of data collection the test of
stability of the instrument is called repeated observation. The measurement of the variable is
repeated overtime and the results at each measurement time are expected to be very similar.
For example, if the researcher develops an observational scale to rate the nurse's behavior
during the process of counting narcotics 3 days in a row, it would have similar rating each
day. If the ratings are different each day, question arises regarding the reliability of rating,
whether or not the trait or character being measured is stable and whether or not the
observation is done the same way every day, i.e. whether or not the observer is consistent .
Observational scale

Administration to same set of subjects.

Measurement Measurement
of variable at of variable at
time point 1 time point 2

If the results at each time yield similar scores then


the tool is considered as stable

Fig:- assessing stability of the observation scale

o Equivalence
When the variable being measured is not a stable one the reliability of an instrument cannot be
tested by repeated measures. Test of equivalence attempts to determine if similar tests
administered at the same time yield the same results or if different observers, observing same
phenomena at the same time report similar results.

Equivalence is based on the idea of using alternate forms of


measurement of the same trait at the same time and comparing the results. Test of equivalence
are of two categories: Alternate form and inter-rater reliability.

a. Alternate form: The test of equivalence using alternate forms of paper and pencil tests
consisting of two sets of similar questions designed to measure the same trait is called alternate
form testing. The two tests are based on similar content but the individual items are different.
When these two tests are administered to subjects at the same time, the results can be compared
just as in test/retest method. Obtaining similar results on the two alternate forms of the
instrument gives support for the reliability of both forms of the instrument. For example, the
instructor develops two tests with the same content but different questions are administered to
same subjects. If one student were to take two forms of the test, the result should reflect the same
level of knowledge if the tests are reliable.

based on similar content

Administered to same set of subjects at the same time

Obtaining similar results on the two alternate forms of the


instrument gives support for the reliability

Fig:- Assessing equivalence of the instrument

The major problem with alternate form of questions is that they tend to be boring for the subject.
When the questionnaire and interview is very long the addition of another questionnaire with the
same length may be too tiring for the subjects. This may introduce new source of error through
subject fatigue and boredom.

. Inter-rater reliability (interob server agreement): If a measurement process involves rating by


observers, a reliable measurement will require consistency between different ratings. It is used to
determine whether two observers using the same instrument at the same time will obtain similar
results. A reliable instrument should produce the same results if both observers are using it the
same way. Inter-rater reliability requires completely independent ratings of the same event by
more than one rater. No discussion or collaboration can occur when reliability is being tested.
Reliability is determined by the correlation of the scores from two or more independent raters.
Inter-rater reliability is often assessed using Cronbach's a when the judgments are quantitative or
Cohen's Kappa when judgments are categorical. For example, in an observational tool designed
to measure the assertiveness of an individual, two researchers observe the interaction together
and rate the assertiveness of the subjects using the same scale separately. These ratings are then
compared for equivalence. The extent to which they agree serves as a measure of the reliability
of the tool. This method is also useful to test the reliability of interpreting the physiological tools.
For example, to test the inter-rater reliability of a blood pressure reading, a double stethoscope is
used which enables two people to listen and agree on blood pressure reading at the same time.

Observer 1 Observer

Observing the subjects for same


phenomenon independently

Tool considered reliable if it yields


similar observstions

Fig:- assessing inter-rater reliability

Internal consistency or scale homogeneity

Internal consistency refers to the extent to which all parts of the measurement technique are
measuring the same concept. It gives an estimate of the equivalence of set of items from the
same test. It is based on the assumption that items measuring the same construct or concept
should correlate. For example, in a questionnaire to measure depression each question should
provide a measure of depression consistent with the overall results of the test. In laboratory tests
this concept includes the idea that the results obtained from counting the red blood cells in one
drop of blood from a specimen should be the same as though obtained from another drop of
blood from the same specimen. To ensure internal consistency, all the questions in the structured
questionnaire should be able to measure a single trait or characteristic or phenomena contributing
to the overall measure of the concept.
 If the variable being measured is a changeable one, test-retest method cannot be used. If
alternate form of questions is not possible because the length of the questionnaires would
prohibit asking the subjects to complete two at the same time then equivalence is not an
option. Only internal consistency will provide useful measure of reliability in these cases.

❖ Split half correlation is used to test the internal consistency in which the items on the
instrument are divided into two halves, and the correlation between the scores on the two parts is
computed. The halves may be divided by obtaining the scores on the first half of the test and
comparing them with the scores on the second half of the test or by comparing odd numbered
items with even-numbered items. If all the items V are consistently measuring the overall fo
concept then the scores on the two halves is of the test should be correlated .

Divine the instrument items into two equal halves, odd or even items or first half and
second half.

Administer to the same set of subjects at same time

Compare the scores on the first half of the test with scores on the second half of the
test or comparing the odd numbered item score with even numbered item score

If all the items are consistently measuring the overall concept then the scores on the
two halves of the test should be correlated

Fig:- Assessing internal consistency

 To get a good measure of reliability, the test is divided into two halves in an unbiased manner
and Cronbach's alpha coefficient calculated to establish internal consistency (when items of
an instrument are scored on summated scales like Likert-type scale such as quality of life
instrument, depression scale, etc.).
 Another statistical procedure used to assess internal consistency is Kuder- Richardson 20
when the items of an instrument are scored dichotomously.

Example: 1 = yes, 0 = no.

❖ Internal consistency is a useful device for establishing reliability in a highly structured


quantitative data collection instrument. It is not useful in open-ended questionnaires or
interviews, unstructured observations, projective tests or other qualitative data collection
methods and instruments.

A brief summary of methods of testing reliability is presented in Table 8.11.

 VALIDITY OF RESEARCH TOOL

Validity of an instrument refers to the degree to which an instrument measures what it is


supposed to be measuring. For example, a temperature-measuring instrument is supposed to
measure only the temperature; it cannot be considered a valid instrument if it measures an
attribute other than temperature. Similarly, if a researcher developed a tool to measure the pain,
and if it also includes the items to measure anxiety, it cannot be considered a valid tool.
Therefore, a valid tool should only measure what it supposed to be measuring.

Validity is the second important criterion for evaluating a quantitative instrument. It is defined as
the extent to which a concept is accurately measured. It also refers to the degree or extent to
which an instrument measures what it is supposed to measure.

 Definitions

 According to Treece and Treece, 'Validity refers to an instrument or test actually testing what
it is supposed to be testing'.

 According to Polit and Hungler, 'Validity refers to the degree to which an instrument
measures what it is supposed to be measuring".

 According to American Psychological Foundation, 'Validity is the appropriateness, meaning


fullness, and usefulness of the interference made from the scoring of the instrument.

 Validity is the appropriateness, completeness, and usefulness of an attribute measuring


research instrument.
Reliability and validity are not independent qualities of an instrument. Although validity is an
important characteristic of an instrument, reliability is necessary before validity can be
considered. An instrument that is not reliable cannot be valid. However, an instrument can be
reliable without being valid. A few examples which underline the concepts of reliability and
validity are:

An alarm clock which rings at 6 am while it is actually set for 5.30 am. Here the clock is
very reliable as it regularly rings at the same time every morning, i.e. 6 am. However, it is
not valid as it is not ringing at the preferred time, i.e. 5.30 am. In the said scenario if the
clock rings at various times every morning then it is not considered reliable too.

A researcher constructed a multiple choice questionnaire to assess the drug knowledge


among nursing students. It is found that the questionnaire was yielding same scores when
it was repeatedly administered. However, it was found to be assessing the arithmetic
skills instead of the drug knowledge. Though this questionnaire is reliable it is not valid.

 Types of Validity

Basically validity is classified into following four categories.


Types of
validity

criterion-
content construct
related
validity validity
validity

Predictive Co-current
Validity validity

Convergent Divergent
validity Validity

 Face validity: Face validity involves an overall look of an instrument regarding its
appropriateness to measure a particular attribute or phenomenon. Though face validity is not
considered a very important and essential type of validity for an instrument. However, it may
be taken in consideration while assessing for other aspects of validity of a research
instrument. In simple words, this aspect of validity refers to the face value or the outlook of
an instrument. For example, a Likert scale designed to measure the attitude of the nurses
towards the patients admitted with HIV/AIDS; a researcher may judge the face value of this
instrument by its appearance, that is it looks good or not; but it does not provide any
guarantee about the appropriateness and completeness of a research instrument with re gard
to its content, construct, and measurement score.

 Content validity: It is concerned with scope of coverage of the content area to be mer
sured. More often it is applied in tests of knowledge measurement. It is mostly used in
measuring complex psychologic tests of a person. It is a case of expert judgment about the
content area included in the research instrument to measure a particular phenomenon.
Judgement of the content viability may be subjective and is based on previous researchers
and experts opinion about the adequacy, appropriateness, and completeness of the content of
instrument. Generally, this viabil the content. s viability is ensured through the judgments of
experts about
 Criterion validity: This type of validity is a relationship between measurements of the
instrumept with some other external criteria. For example, a tool is developed to mo sure the
professionalism among nurses and to assess the criterion validity nurses were separately
asked about the number of research papers they published and number of professional
conferences they have attended. Later a correlation coefficient is calculated to assess
criterion validity. This tool considered strong with criterion validity positive correlation
exists between score of the tool measuring professionalism and the number of research
articles published and professional conferences attended by and nurses. The instrument is
valid if its measurements strongly respond to the score of some other valid criteria. The
problem with criterion-related validity is finding a reliable and valid external criterion.
Mostly we are to rely on a less than perfect criterion because the rating found by empirical
and supervisory methods may be computed mathematically, which can correlate score of
instrument with scores of criterion variable. Here the range of coef- ficient ≥0.70 is desirable.
Criterion-related validity may be differentiated by predictive and concurrent validity.

Predictive validity: It is the degree of forecasting judgement; for example, some personal-
ity tests on academic futures of students can be predictive of behaviour patterns. It is the
differentiation between performances on some future criterion and instruments ability. An
instrument may have predictive validity when its score significantly corre- lates with some
future criteria.

Concurrent validity. It is the degree of the measures in present. It relates to the present
specific behaviour and characteristics; hence the difference between predictive and
concurrent validity refers to timing pattern of obtaining measurements of a criterion.

 Construct validity: A construct is founded in this type of validity, such as a nurse may
have designed an instrument to measure the concept of pain in amputated patients. The pain
pattern may be due to anxiety; hence the results may be misleading. Construct validity is a
key criterion for assessing the quality of a study, and construct validity has most often been
addressed in terms of measurement issues. The key construct validity questions with regard
to measurements are: What is this instrument really measuring? Does it adequately measure
the abstract concept of interest? Construct validity gives more importance to test relationship
predicted on theoretical measurements. The researcher can make prediction in relation to
other such type of constructs. One method of construct validation is known as group
technique.
SUMMARY:-

Reliability refers to a study's replicability, while validity refers to a study's accuracy. A study
can be repeated many times and give the same result each time, and yet the result could be
wrong or inaccurate. This study would have high reliability, but low validity; and therefore,
conclusions can't be drawn from it. Reliability in research is a concept describing how
reproducible or replicable a study is. In general, if a study can be repeated and the same
results are found, the study is considered reliable. Studies can be reliable across time and
reliable across samples. Validity of research is an evaluation of how accurate the study is. It
describes the extent to which the study actually measures what it intends to measure.

CONCLUSION:-

here are several tools for measuring reliability, including the split-half method, test-retest
method, internal consistency, and reliability coefficient. The split-half method divides the
study sample group into two smaller groups and compares the results. The test-retest method
involves repeating the same study at different point in time to look for the same result.
Internal consistency gauges the stability of scores or answers of an assessment that should
correlate. The reliability coefficient is a number between 0 and 1 that scores how likely the
variance shown amongst study findings is due to true variance or to error. Reliability is
important because it measures the quality of the research. Findings that are true or accurate
from a research study are often reliable.
reference:-

1. Polit DF, Beck CT. Nursing Research. 8th ed. New Delhi: Wolters Kluwer (India) Pvt.
Ltd Publishers; 2011. P. 15

2. Sharma SK. Nursing Research & Statistics. 4th ed. New Delhi: ELSEVIER Publishers;
2011. P. 20-24

3. Basavanthappa BT. Nursing Research and Statistics. 3rd ed. New Delhi: JAYPEE
Brothers Medical Publishers (P) Ltd; 2014. P. 34

4. Rentala Sreevani Basics in NURSING RESEARCH AND BIOSTATISTICS. first ed.


New Delhi: Jaypee brothers publishers;2018. P. 245

You might also like