You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/6914582

Reliability and validity in research

Article in Nursing standard: official newspaper of the Royal College of Nursing · July 2006
DOI: 10.7748/ns2006.07.20.44.41.c6560 · Source: PubMed

CITATIONS READS

245 7,703

2 authors, including:

Helena Mary Priest


Staffordshire University
80 PUBLICATIONS 1,860 CITATIONS

SEE PROFILE

All content following this page was uploaded by Helena Mary Priest on 08 April 2021.

The user has requested enhancement of the downloaded file.


art & science CLINICAL RESEARCH
The synthesis of art and science is lived by
the nurse in the nursing act JOSEPHINE c PATERSON

E DUC ATI 0N

Reliability and validity in research


Roberts P et al (2006) Reliability and validity in research. Nursing Standard. 20, 44, 41-45.
Date of acceptance: April 6 2006.

depends on a number of research features: the initial


SummarY research question, how data are collected including
This article examines reliability and validity as ways to demonstrate when and from whom, how they are analysed, and
the rigour and trustworthiness of quantitative and qualitative what conclusions are drawn.
research. The authors discuss the basic principles of reliability and Although Murphy and Dingwall (2003) are
validity for readers who are new to research. right in stating that following procedures alone is
Authors not sufficient to produce trustworthy results, this
is not to say that procedures can be ignored.
Paula Roberts and Helena Priest are senior lecturers, School of When reading published research or designing
Nursing and Midwifery, Keele University, Keele, Staffordshire; research projects, it is important to consider
Michael Traynor is professor of nursing, Middlesex University, issues of reliability and validity from the outset.
London. Email: p.m.roberts@keele.ac.uk The authors provide an overview of the basic
Keyyyords principles of reliability and validity in relation to
quantitative and qualitative nursing research. In
Qualitative research; Quantitative research; Reliability; Validity so doing, they aim to assist those readers starting
These keywords are based on the subject headings from the British out in reading or conducting research to
Nursing Index. This article has been subject to double-blind review. understand and use these frequently encountered
For related articles and author guidelines visit our online archive at terms correctly. Readers are directed to other
www.nursing-standard.co.uk and search using the keywords. useful resources that can provide more detail
than is possible within the scope of this article.
Of the two terms, reliability is probably the
'THE HALLMARK OF science is the pursuit of simpler to understand and demonstrate.
truth and the limitation of error. As such, science is Reliability describes how far a particular test,
an attitude of mind rather than a set of procedures. procedure or tool, such as a questionnaire, will
The defining characteristic of that attitude is a produce similar results in different circumstances,
commitment to subject any claim to rigorous assuming nothing else has changed.
evaluation and the conscientious seeking out of Validity is a subtler concept. It is about the
evidence that might contradict or modify that closeness of what we believe we are measuring to
claim' (Murphy and Dingwall 2003). Although what we intended to measure. For example, a
presenting an idealised view of science, this strategic health authority might want to know if it
statement is a good starting point for a discussion is worth paying more to a university to train
that aims to demystify two related examples of nurses to degree rather than diploma level. They
research terminology: reliability and validity. want to know: are graduate nurses as competent
Reliability and validity are ways of to practise as diploma holders? We design a
demonstrating and communicating the rigour of questionnaire, test its reliability, get both types of
research processes and the trustworthiness of nurse to complete it, and tell the authority that
research findings. If research is to be helpful, it degree and diploma level nurses are equally
should avoid misleading those who use it. If a competent. However, someone notices that many
hospital decides to replace one treatment with ofthe questions were about sociological theory,
another safer and more effective treatment, then advanced statistics and the history of science, and
managers, clinicians and patients can justifiably they point out that this is not what we meant by
expect the decision to be based on good, rather 'competent to practise'. Reliability is a necessary
than flawed, evidence. This trustworthiness but insufficient condition for validity, and
NURSING STANDARD July 12 :: vol 20 no 44 :: 2006 4 1
researchers consider that whichever approach is
art & science research methods used, there is a need to determine how far the
research findings are believable, accurate and
useful (Creswell 1998).
although the questionnaire may have been It is sometimes claimed, however, that
reliable, it was not valid. traditional tests of rigour are not relevant in the
qualitative paradigm - qualitative research should
Quantitative and qualitative research be judged against a different set of standards, using
alternative terminology, and evaluated using an
Quantitative research is the conduct of alternative set of strategies to quantitative
investigations primarily using numerical research. In the light ofthis claim, qualitative
methods, whereas qualitative research tends to researchers have devised an array of procedures
use exploratory approaches and produce textual for demonstrating validity and reliability.
data rather than numbers or measurements.
Quantitative or statistical methods have been Reliability in quantitative research
used in research since the early nineteenth century
and Florence Nightingale was one ofthe pioneers. Research methods resulting in the production of
The Royal Statistical Society was founded in 1834, numerical data of relevance to nurses include
although famous examples of data collection and experiments, for example, a randomised
presentation go back another 150 years. controlled trial ofthe effectiveness of a new
Concerned with questions about how likely their clinical treatment, and surveys, for example,
results are to be misleading, statisticians have patient satisfaction questionnaires. Further
devised procedures for expressing likelihoods and information on these approaches can be found in
for estimating how accurate their measurements Cormack (2000). Researchers have developed
are. Being able to report on these procedures and particular measures of reliability, some
their outcomes can help to demonstrate the rigour, associated with specific statistical tests.
accuracy and, therefore, the usefulness of such Essentially, any research tool should provide
work. Unsurprisingly, the basis of these the same information if used by different people
procedures is mathematical. (inter-rater reliability), or if it is used at different
However, qualitative researchers find, when times, for example, on Friday morning and again
searching for similar procedures, that there is no on Sunday afternoon (test-retest reliability). The
simple equivalent. For complex reasons, social internal consistency of research tools needs to be
scientists also began to feel that they needed to assessed. Internal consistency is the relationship
address issues like rigour and trustworthiness. between all the results obtained from a single test
Grounded theory, for example, sought to address or survey. If we ask people ten questions about
these issues (Glaser and Strauss 1967). job satisfaction, do they answer every question in
Qualitative researchers felt challenged to develop a similar way, or are there a few questions where
the same notions of reliability and validity as their the replies seem to be unrelated to the others?
colleagues who used quantitative methods. The Internal consistency of items such as
problem was that some qualitative researchers individual questions in a questionnaire can be
had actively rejected assumptions about reality measured using statistical procedures such as
and our ability to represent it accurately and Cronbach's alpha coefficient (Cronbach 1951),
objectively. For individuals working within this randomly splitting all the responses to a question
school of thought, tests to assess accuracy of into two sets, totalling the scores on the two sets,
measurement are problematic. Hence, there is and working out the correlation between the two
debate about 'equivalents' to these concepts in sets. This is known as a 'split-half test. A more
qualitative research. Despite the variety of sophisticated way of doing this is to create all
schemes devised by different researchers, the vital possible split halves and determine the average
question that all address is: 'How can I assure the correlation between all of them. Cronbach's
user of my work that it is trustworthy?' alpha (1951) is an estimate of the average of all
There are many approaches to the generation split-half estimates of reliability.
and analysis of qualitative data, including Reliability is the proportion of variability in a
phenomenology, grounded theory, qualitative measured score that is due to variability in the
content analysis and narrative analysis (Priest et true score (rather than some kind of error). A
a/2002). The purpose in each case is to reliability of 0.9 means 90 per cent ofthe
summarise the data using a variety of coding variability in the observed score is true and 10 per
procedures. Many proceed from particular cent is due to error. A reliability of 80 to 90 per
philosophical or political standpoints and these cent is recommended for most research purposes.
tend to have well-elaborated positions on issues Methods of estimating the reliability of
like truthfulness. In general, qualitative measurements do have some limitations; for

42 July 12 :: vol 20 no 44 :: 2006 NURSING STANDARO


example, test-retest reliability is potentially another in that their primary purpose is to treat
flawed if respondents' previous experiences in bacterial infection.
the first testing influence responses in the second There are several ways of demonstrating
testing (Carmines and Zeller 1979). Moreover, construct validity, one of which is factor analysis.
intervening events between the two Factor analysis refers to a number of statistical
administrations may account for differences procedures used to determine characteristics that
between the two sets of results (Bryman and relate to each other (Bryman and Cramer 2004).
Cramer 2004) and contribute to flaws in external Factor analysis is particularly useful for examining
validity (Robinson Kurpius and Stafford 2005). the relationships between large numbers of
variables, disentangling them and identifying
Validity in quantitative research clusters of variables that are closely linked together
(Burns and Grove 2005). In the antibiotic example,
Validity describes the extent to which a measure factor analysis might identify a cluster of drugs
accurately represents the concept it claims to useful for treating respiratory infections, a cluster
measure (Punch 1998). There are two broad useful for treating skin infections and another
measures of validity - external and internal. group with a number of different uses (broad
External validity addresses the ability to apply with spectrum antibiotics). For more detailed discussion
confidence the findings ofthe study to other people of different methods to conduct factor analysis,
and other situations, and ensures that the and how statistical computer packages can assist
'conditions under which the study is carried out are the process, see Bryman and Cramer (2004).
representative ofthe situations and time to which
the results are to apply' (Black 1999). The sample Reliability in qualitative research
of participants drawn from the population of
interest must be representative of that population In qualitative research, reliability can be thought
at the time ofthe study. Finally, representative of as the trustworthiness of the procedures and
samples should be drawn with reference to relevant data generated (Stiles 1993). It is concerned with
variables in the study, such as gender and age. the extent to which the results of a study or a
Internal validity addresses the reasons for the measure are repeatable in different circumstances
outcomes of the study, and helps to reduce other, (Bryman 2001). Thus, we need to confirm
often unanticipated, reasons for these outcomes. findings by revisiting data in different
Three approaches to assessing internal validity circumstances. For example, to overcome any
are content validity, criterion-related validity, researcher bias in the interpretation of data and as
and construct validity (Eby 1993, Punch 1998). an auditing measure, interview data may be sent
Content validity is the weakest level of to an independent researcher to verify how much
validity, and is concerned with the relevance and agreement there is about findings and analysis - a
representativeness of items, such as individual form of inter-rater reliability (Weber 1990).
questions in a questionnaire, to the intended Additionally, keeping detailed notes on decisions
setting. It is particularly important to measure made throughout the process will add to the
this if the study is designed to ascertain project's auditability and, therefore, reliability.
respondents' knowledge within a specific field, Qualitative content analysis is a particularly
or to measure personal attributes such as reliable approach to handling data. Specific
attitudes (Eby 1993). It can be achieved through codes are created to describe the data, such as
conducting a pilot study with people who are statements from interview transcripts, and can
similar to the intended study participants. Such be confirmed by revisiting previously coded data
relevance can be supported by literature reviews periodically to check for stability over time
and documentary evidence, where available. (Roberts 1999). Additionally, using
Criterion-related validity is a stronger form computerised data analysis packages, such as
of validity, established when a tool such as a NVivo (QSR), can enhance reliability (Roberts
questionnaire can be compared to other similar and Woods 2000) by applying the rules built into
validated measures of the same concept or the programme (Robson 1994). Furthermore,
phenomenon (Eby 1993). However, where no line by line numbering of interview transcripts
other measures exist, this will not be possible. inbuilt in such software may assist inter-rater
Construct validity involves demonstrating reliability. However, caution needs to be
relationships between the concepts under study exercised in that an over-emphasis on
and the construct or theory that is relevant to standardisation may separate the data from its
them. Eor example, the group of drugs known context so much that it almost becomes
as antibiotics (construct) comprises several meaningless (Burton 2000).
distinct drugs (concepts, for example, Other methods for increasing reliability
penicillin, erythromycin and oxytetracycline), include ensuring technical accuracy in recording
which are individual drugs but related to one and transcribing. While some suggest that

NURSING STANOARO July 12 :: vol 20 no 44 ;: 2006 4 3


the work of Husserl (1982), such bias is avoided
art & science research methods through the use of 'bracketing', whereby
researchers attempt to suspend their experience,
judgement and beliefs (Cutcliffe and McKenna
tape-recorded interviews and interview 1999). Although bracketing is often difficult, if
transcripts can help improve reliability, important not impossible, to achieve, the credibility of
non-verbal aspects of communication are findings is increased if researchers make explicit
sometimes omitted from transcripts (Perakyla their presuppositions and acknowledge their
1997), In one approach to qualitative research, subjective judgements (Ashworth 1997a,
conversation analysis, an attempt is made to 1997b). It is becoming increasingly usual for
capture the subtle nuances of communication by, qualitative researchers to reflect openly on their
for example, using standardised transcription own ability, or otherwise, to be unbiased, and to
notation, which records pauses, emphases and consider the effects of this on the research and on
interruptions (Atkinson and Heritage 1984). themselves as researchers. This process,
Intensive engagement with the data - moving sometimes known as 'reflexivity', will appear in
backwards and forwards between the data and the final written account ofthe research process
our interpretation of it - and making firm hnks and can serve as a further measure of validity.
between our interpretations and the data by, for The reduction of bias can also be facilitated by
example, using verbatim examples of respondent validation. This refers to the practice
participants' comments in written accounts ofthe of researchers sharing interpretations and
findings, can all increase reliability and theorising with the research participants, who
readability. However, it is important that the can check, amend and provide feedback as to
selection of illustrative quotations does not whether they are recognisable accounts
introduce bias by 'cherry picking' the most vivid consistent with their experience (Bryman 2001).
examples from the research. They should reflect Although often described as a test of validity
the range and tone of responses generated. (Lincoln and Guba 1985), the usefulness of
respondent validation may also lie in providing
Validity in qualitative research researchers with an opportunity to think again
about their interpretations, A range of factors
Validity is assessed in terms of how well the determines whether or not participants agree
research tools measure the phenomena under with the researcher's conclusions, such as what is
investigation (Punch 1998), Apotential difficulty at stake for them if researchers present their
in achieving validity in qualitative research is actions in an unfavourable light? Or do they
researcher bias, arising out of selective collection interpret the exercise as an invitation to be
and recording of data, or from interpretation supportive and agreeable to the researchers?
based on personal perspectives (Johnson 1997). In Good research often aims to get beyond
the case of interviews, a common method of data participants' own understandings of their
collection in qualitative research, the validity of situations. One common strategy is to illustrate
the interview data needs to be considered. analyses with what has been described as 'low
While the interviewer should assume that self- inference descriptors' (Johnson 1997), that is,
reporting is accurate and therefore valid (Appleton using examples of participants' verbatim accounts
1995, Burns and Grove 2005), distortions can arise within the written account of the findings to
through the process of analysis and interpretation, demonstrate that findings are grounded in the
'Interested' researchers who are familiar with the data. The practice of 'cherry picking' only
field may overlook certain nuances and particularly vivid examples should be avoided.
ambiguities of data because of their implicit Triangulation is another way of enhancing the
understanding ofthe research setting. Being validity of qualitative research. Triangulation
familiar with the setting, its people and processes, is describes the combination of two or more
both advantageous and potentially problematic. theories, data sources, methods or researchers in
Such insights can be useful in authenticating the study of a topic (Halcomb and Andrew 2005,
responses and findings, but familiarity may also Williamson 2005), and assists with the
obscure any ambiguous issues that others, from consistency, comprehensiveness and robustness
outside the field, might question. ofthe study. Examples include cross-case analysis
Where the researcher is personally familiar and comparisons across data from different
with the subject or environment being studied, groups of participants, and cross checking with
the researcher can try to be non-reactive and documentary evidence and published literature
achieve analytical distance. The researcher thus (Halcomb and Andrew 2005, Williamson 2005).
attempts to minimise bias in the data collection, Other approaches to increasing validity include
interpretation and presentation of findings. In prolonged engagement in the research site
descriptive phenomenological research, based on (Erlandson etal 1993), checking for deviant cases

44 July 12 :: vol 20 no 44 :: 2006 NURSING STANDARD


by requesting negative descriptions of the topic validity of research is one way of producing
under investigation (Streubert Speziale and useful and trustworthy research findings. In
Carpenter 1995) and regular supervision and peer determining the reliability and validity of
review on analysis and findings by researchers. research, reducing error is of prime concern.
Some qualitative researchers claim that However, while adhering as closely as possible to
complete objectivity or detachment in the research a set of procedures to pursue truth and limit error
process is impossible (Guba and Lincoln 1981, is important, an attitude that seeks to ensure
Stiles 1993). However, it is helpful if the research rigour in research is equally important.
process is transparent so that readers can trace the All measures possess some residual bias or
decision processes relating to personal orientation unreliability or inaccuracy (Punch 1998). While
and context, theory, methodology and analysis efforts can be made to minimise such risks,
throughout the life ofthe study. Morse and Field particularly systematic errors, they are
(1996) andKoch (1994) have recommended that acknowledged as a limitation in all types of
an audit trail should be evident in the study to research. While researchers should use as many
demonstrate rigour. This is commonly achieved approaches as possible to ensure reliability and
through the maintenance of a research diary, in validity, there remains the possibility that flaws
which the decision trail is recorded. may occur at the design, measurement or analysis
stage, resulting in a less than perfect study.
Conclusion However, paying attention to the procedures
outlined in this article will help to produce
Using either established or more novel knowledge that is robust enough to use in
approaches to assessing the reliability and planning or introducing change NS

References
Appleton JV (1995) Analysing Reliability and Validity Assessment. Husserl E (1982) Ideas pertaining Punch KF (1998) Introduction to
qualitative interview data: Sage, London. to a pure phenomenology and to a Social Research. Sage, London.
addressing issues of validity and phenomenological philosophy.
Cormack D (2000) The Research Roberts PM (1999) The develop-
reliability. Journal of Advanced Nijhoff, The Hague. (Original work
Process in Nursing. Fourth edition. ment of NEdSERV: quantitative
Nursing. 22, 5,993-997 published 1913).
Blackwell, Oxford. instrumentation to measure service
Ashworth PD (1997a) The variety Johnson RB (1997) Examining the quality in nurse education. Nurse
of qualitative research. Part one: Creswell J W (1998) Qualitative Education Today. 19, 5, 396-407
validity structure of qualitative
introduction to the problem. Nurse Inquiry and Research Design.
research. Education. 118,2,282-292. Roberts PM, Woods LP (2000)
Education Today. 17 3, 215-218. Choosing among Five Traditions
Alternative methods of gathering
Sage, Thousand Oaks CA. Koch T (1994) Establishing rigour
Ashworth PD (1997b) TVie variety and handling data: maximising the
in qualitative research: the decision
of qualitative research. Part two: Cronbach LJ (1951) Coefficient use of modern technology. Nurse
trail. Journal of Advanced Nursing.
non-positivist approaches. alpha and the internal structure of Researcher. 8, 2, 84-95.
19, 5, 976-986.
Nurse Education Today. 17 3, tests. Psychometrika. 16, 3, 297-334.
219-224. Robinson Kurpius SE,
Lincoln YS, Guba EG (1985)
Cutcliffe JR, McKenna HP (1999) Stafford ME (2005) Testing and
Atkinson J M , Heritage J (1984) Naturalistic Inquiry. Sage, Newbury
Establishing the credibility of Measurement A User-Friendly
Structures of Social Action: Studies Park CA. Guide. Sage, London.
qualitative research findings: the
in Conversation Analysis Cambridge
plot thickens. Journal of Advanced Morse J M , Field PA (1996)
University Press, Cambridge. Robson C (1994) Analysing
Nursing. 30, 2, 374-380. Nursing Research. The Application documents and records. In
Black TR (1999) Doing Quantitative of Qualitative Approaches. Second Bennett N, Glatter R, Levacic R (Eds)
Eby iVI (1993) Validation: choosing
Research in the Social Sciences. edition. Chapman and Hall, London. Improving Educational Management
a test to fit the design. Nurse
Sage, London. Through Research and Consultancy.
Researcher. 1, 2, 27-33. Murphy E, Dingwall R (2003)
Bryman A (2001) Social Research Paul Chapman, London, 237-247
Qualitative Methods and Health
Eriandson DA, Harris EL,
Methods. Oxford University Press, Policy Research. Aldine De Gruyter, Stiles WB (1993) Quality control in
Oxford. Skipper BL, Allen SD (1993) Doing
New York NY. qualitative research. Clinical
Naturalistic Inquiry. A Guide to
Bryman A, Cramer D (2004) Psychology Review. B , 6, 593-618.
Methods. Sage, Newbury Park CA. Perakyla A (1997) Reliability and
Quantitative Data Analysis with Streubert Speziale H, Carpenter D
validity in research based on tapes
SPSS 12 and 13: A Guide for Social Glaser BG, Strauss AL (1967) The
and transcripts. In Silverman D (Ed) (1995) Qualitative Research in
Scientists. Routledge, London. Discovery of Grounded Theory.
Qualitative Research. Theory, Nursing: Advancing the Humanistic
Aldine, Chicago I L
Burns N, Grove SK (2005) The Method and Practice. Sage, Imperative. LippincotL Philadelphia PA.
Practice of Nursing Research: Guba EG, Lincoln YS (1981) Thousand Oaks CA, 201-220.
Weber RP (1990) Basic Content
Conduct, Critique, and Utilization. Effective Evaluation. Jossey Bass, Analysis. Second edition. Sage,
Fifth edition. WB Saunders, Priest HM, Roberts PM,
San Francisco CA. Newbury Park CA.
Philadelphia PA. Woods LP (2002) An overview of
Halcomb E, Andrew S (2005) three different approaches to the Williamson GR (2005) Illustrating
Burton D (2000) Research Training Triangulation as a method for interpretation of qualitative data. triangulation in mixed-methods
for Social Scientists. Sage, London. contemporary nursing research. Part 1: theoretical issues Nurse nursing research. Nurse Researcher
Carmines EG, Zelier RA (1979) Nurse Researcher. 13, 2, 71-82. Researcher 10,1, 30-42. 12, 4, 7-18.

NURSING STANDARD July 12 :: vol 20 no 44 :: 2006 45


View publication stats

You might also like