Professional Documents
Culture Documents
SCIENCES
(NURSING)
An ideal measuring instrument is one which results in relevant, accurate, objective, sensitive, and
efficient measures. Physiological or physical measurements have a greater chance of having
these characteristics than psychological measurements. Scales developed to measure these
psychological variables are often imperfect and error prone. A number of techniques are
available to evaluate the quality of these measuring tools to minimize error. These techniques
estimate reliability, validity, objectivity, sensitivity, specificity, and appropriateness. Thus if the
measuring tools are reliable, valid, objective, sensitive, specific and appropriate it will improve
accuracy and enhance scientific quality of the research.
Realibility
Appropiateness Validity
characteristics
specificiity objectivity
sensitivity
The quality and adequacy of quantities data can only be assessed by establishing the
reliability of an instrument. Reliability is the degree of consistency with which the attributes
or variables are measured by an instrument. For example, a blood pressure-measuring
instrument gave a reading of 120 mmHg systolic blood pressure; after some time when blood
pres- sure is again measured for the same subject, it gave a reading of 160 mmHg systolic
blood pressure. In this situation, this instrument is not considered reliable. However, if a
research instrument yields similar or close to similar results on repeated administration of
instrument, it is considered as highly reliable research instrument.
Reliability is the degree of consistency and accuracy with which an instrument measures the
attribute for which it is designed to measure.
Meaning:-
Reliability refers to the consistency with which an instrument or test measures what it is
supposed to measure.
Reliability is the extent to which a measurement or instrument or test yields the same results
on repeated administration.
The more reliable a test or instrument, the more researcher can rely on the scores. There are
three methods of testing the reliability of research instrument: Stability, equivalence, internal
consistency.
Methods of
test reliability
internal
stability equivalence
consistency
A stable research instrument is one which when repeated over and over on the same research
subject will produce the same research results. Stability of the instrument can be evaluated by
test-retest method and repeated observations.
Test-retest method: Repeated measurements over time using the same instrument on same
subjects is expected to produce the same results. In this method the researcher administers
the same test twice over a period of time to a group of individuals. The scores from time 1
and time 2 can then be correlated in order to evaluate the stability of the test. If the test or
instrument is reliable, individual scores will be very similar at both tests .
Tool/Instrument
Administration to same
set of subjects
Time Time
point -1 point- 2
A concern with this type of reliability is how to determine the amount of time that should
elapse between the two testing. Knapp and Brown say that test-retest reliabilities are
generally higher when the time lapse between the testing is short, usually no longer than 4
weeks. If the intervals are too short, memory of the responses given during the first session
may influence responses during the second session.
A reliable questionnaire will give consistent results overtime. If the results are not consistent
the test is not considered reliable and will need to be revised until it does measure
consistently.
Test retest is used primarily with questionnaires but the concept of repeated measurement to
establish stability can also be used with tools like thermometer and hemodynamic monitors
and in instances where the variable being measured is not expected to fluctuate.
The main limitation is that the test for stability can be performed only when the measuring
trait remains constant over time. An example for a stable concept is intelligence. It should be
possible to measure intelligence repeatedly at regular intervals and obtain the same score. An
unstable concept such as pain is changeable and subject to frequent fluctuations even for a
person with chronic pain. Repeated measures of pain in a subject would result in widely
different scores. These differences would not mean that the instrument is unstable but rather
that the individual's pain was changing (variable being measured was changing).
Repeated observations: When using observational methods of data collection the test of
stability of the instrument is called repeated observation. The measurement of the variable is
repeated overtime and the results at each measurement time are expected to be very similar.
For example, if the researcher develops an observational scale to rate the nurse's behavior
during the process of counting narcotics 3 days in a row, it would have similar rating each
day. If the ratings are different each day, question arises regarding the reliability of rating,
whether or not the trait or character being measured is stable and whether or not the
observation is done the same way every day, i.e. whether or not the observer is consistent .
Observational scale
Measurement Measurement
of variable at of variable at
time point 1 time point 2
o Equivalence
When the variable being measured is not a stable one the reliability of an instrument cannot be
tested by repeated measures. Test of equivalence attempts to determine if similar tests
administered at the same time yield the same results or if different observers, observing same
phenomena at the same time report similar results.
a. Alternate form: The test of equivalence using alternate forms of paper and pencil tests
consisting of two sets of similar questions designed to measure the same trait is called alternate
form testing. The two tests are based on similar content but the individual items are different.
When these two tests are administered to subjects at the same time, the results can be compared
just as in test/retest method. Obtaining similar results on the two alternate forms of the
instrument gives support for the reliability of both forms of the instrument. For example, the
instructor develops two tests with the same content but different questions are administered to
same subjects. If one student were to take two forms of the test, the result should reflect the same
level of knowledge if the tests are reliable.
The major problem with alternate form of questions is that they tend to be boring for the subject.
When the questionnaire and interview is very long the addition of another questionnaire with the
same length may be too tiring for the subjects. This may introduce new source of error through
subject fatigue and boredom.
Observer 1 Observer
Internal consistency refers to the extent to which all parts of the measurement technique are
measuring the same concept. It gives an estimate of the equivalence of set of items from the
same test. It is based on the assumption that items measuring the same construct or concept
should correlate. For example, in a questionnaire to measure depression each question should
provide a measure of depression consistent with the overall results of the test. In laboratory tests
this concept includes the idea that the results obtained from counting the red blood cells in one
drop of blood from a specimen should be the same as though obtained from another drop of
blood from the same specimen. To ensure internal consistency, all the questions in the structured
questionnaire should be able to measure a single trait or characteristic or phenomena contributing
to the overall measure of the concept.
If the variable being measured is a changeable one, test-retest method cannot be used. If
alternate form of questions is not possible because the length of the questionnaires would
prohibit asking the subjects to complete two at the same time then equivalence is not an
option. Only internal consistency will provide useful measure of reliability in these cases.
❖ Split half correlation is used to test the internal consistency in which the items on the
instrument are divided into two halves, and the correlation between the scores on the two parts is
computed. The halves may be divided by obtaining the scores on the first half of the test and
comparing them with the scores on the second half of the test or by comparing odd numbered
items with even-numbered items. If all the items V are consistently measuring the overall fo
concept then the scores on the two halves is of the test should be correlated .
Divine the instrument items into two equal halves, odd or even items or first half and
second half.
Compare the scores on the first half of the test with scores on the second half of the
test or comparing the odd numbered item score with even numbered item score
If all the items are consistently measuring the overall concept then the scores on the
two halves of the test should be correlated
To get a good measure of reliability, the test is divided into two halves in an unbiased manner
and Cronbach's alpha coefficient calculated to establish internal consistency (when items of
an instrument are scored on summated scales like Likert-type scale such as quality of life
instrument, depression scale, etc.).
Another statistical procedure used to assess internal consistency is Kuder- Richardson 20
when the items of an instrument are scored dichotomously.
Validity is the second important criterion for evaluating a quantitative instrument. It is defined as
the extent to which a concept is accurately measured. It also refers to the degree or extent to
which an instrument measures what it is supposed to measure.
Definitions
According to Treece and Treece, 'Validity refers to an instrument or test actually testing what
it is supposed to be testing'.
According to Polit and Hungler, 'Validity refers to the degree to which an instrument
measures what it is supposed to be measuring".
An alarm clock which rings at 6 am while it is actually set for 5.30 am. Here the clock is
very reliable as it regularly rings at the same time every morning, i.e. 6 am. However, it is
not valid as it is not ringing at the preferred time, i.e. 5.30 am. In the said scenario if the
clock rings at various times every morning then it is not considered reliable too.
Types of Validity
criterion-
content construct
related
validity validity
validity
Predictive Co-current
Validity validity
Convergent Divergent
validity Validity
Face validity: Face validity involves an overall look of an instrument regarding its
appropriateness to measure a particular attribute or phenomenon. Though face validity is not
considered a very important and essential type of validity for an instrument. However, it may
be taken in consideration while assessing for other aspects of validity of a research
instrument. In simple words, this aspect of validity refers to the face value or the outlook of
an instrument. For example, a Likert scale designed to measure the attitude of the nurses
towards the patients admitted with HIV/AIDS; a researcher may judge the face value of this
instrument by its appearance, that is it looks good or not; but it does not provide any
guarantee about the appropriateness and completeness of a research instrument with re gard
to its content, construct, and measurement score.
Content validity: It is concerned with scope of coverage of the content area to be mer
sured. More often it is applied in tests of knowledge measurement. It is mostly used in
measuring complex psychologic tests of a person. It is a case of expert judgment about the
content area included in the research instrument to measure a particular phenomenon.
Judgement of the content viability may be subjective and is based on previous researchers
and experts opinion about the adequacy, appropriateness, and completeness of the content of
instrument. Generally, this viabil the content. s viability is ensured through the judgments of
experts about
Criterion validity: This type of validity is a relationship between measurements of the
instrumept with some other external criteria. For example, a tool is developed to mo sure the
professionalism among nurses and to assess the criterion validity nurses were separately
asked about the number of research papers they published and number of professional
conferences they have attended. Later a correlation coefficient is calculated to assess
criterion validity. This tool considered strong with criterion validity positive correlation
exists between score of the tool measuring professionalism and the number of research
articles published and professional conferences attended by and nurses. The instrument is
valid if its measurements strongly respond to the score of some other valid criteria. The
problem with criterion-related validity is finding a reliable and valid external criterion.
Mostly we are to rely on a less than perfect criterion because the rating found by empirical
and supervisory methods may be computed mathematically, which can correlate score of
instrument with scores of criterion variable. Here the range of coef- ficient ≥0.70 is desirable.
Criterion-related validity may be differentiated by predictive and concurrent validity.
Predictive validity: It is the degree of forecasting judgement; for example, some personal-
ity tests on academic futures of students can be predictive of behaviour patterns. It is the
differentiation between performances on some future criterion and instruments ability. An
instrument may have predictive validity when its score significantly corre- lates with some
future criteria.
Concurrent validity. It is the degree of the measures in present. It relates to the present
specific behaviour and characteristics; hence the difference between predictive and
concurrent validity refers to timing pattern of obtaining measurements of a criterion.
Construct validity: A construct is founded in this type of validity, such as a nurse may
have designed an instrument to measure the concept of pain in amputated patients. The pain
pattern may be due to anxiety; hence the results may be misleading. Construct validity is a
key criterion for assessing the quality of a study, and construct validity has most often been
addressed in terms of measurement issues. The key construct validity questions with regard
to measurements are: What is this instrument really measuring? Does it adequately measure
the abstract concept of interest? Construct validity gives more importance to test relationship
predicted on theoretical measurements. The researcher can make prediction in relation to
other such type of constructs. One method of construct validation is known as group
technique.
SUMMARY:-
Reliability refers to a study's replicability, while validity refers to a study's accuracy. A study
can be repeated many times and give the same result each time, and yet the result could be
wrong or inaccurate. This study would have high reliability, but low validity; and therefore,
conclusions can't be drawn from it. Reliability in research is a concept describing how
reproducible or replicable a study is. In general, if a study can be repeated and the same
results are found, the study is considered reliable. Studies can be reliable across time and
reliable across samples. Validity of research is an evaluation of how accurate the study is. It
describes the extent to which the study actually measures what it intends to measure.
CONCLUSION:-
here are several tools for measuring reliability, including the split-half method, test-retest
method, internal consistency, and reliability coefficient. The split-half method divides the
study sample group into two smaller groups and compares the results. The test-retest method
involves repeating the same study at different point in time to look for the same result.
Internal consistency gauges the stability of scores or answers of an assessment that should
correlate. The reliability coefficient is a number between 0 and 1 that scores how likely the
variance shown amongst study findings is due to true variance or to error. Reliability is
important because it measures the quality of the research. Findings that are true or accurate
from a research study are often reliable.
reference:-
1. Polit DF, Beck CT. Nursing Research. 8th ed. New Delhi: Wolters Kluwer (India) Pvt.
Ltd Publishers; 2011. P. 15
2. Sharma SK. Nursing Research & Statistics. 4th ed. New Delhi: ELSEVIER Publishers;
2011. P. 20-24
3. Basavanthappa BT. Nursing Research and Statistics. 3rd ed. New Delhi: JAYPEE
Brothers Medical Publishers (P) Ltd; 2014. P. 34