You are on page 1of 4

Data Quality

Article

Development and validation of reporting


guidelines for studies involving data linkage

Abstract Megan A. Bohensky, Damien Jolley


Centre for Research Excellence in Patient Safety, Department of Epidemiology
Objective: Data or record linkage is
& Preventive Medicine, Monash University, Victoria
commonly used to combine existing
data sets for the purpose of creating
more comprehensive information to Vijaya Sundararajan
conduct research. Linked data may create Department of Human Services, Victoria
additional concerns about error if cases
are not linked accurately. It is important that Sue Evans, Joseph Ibrahim, Caroline Brand
factors compromising the quality of studies Centre for Research Excellence in Patient Safety, Department of Epidemiology
using linked data be reported in a clear & Preventive Medicine, Monash University, Victoria
and consistent way that allows readers
and researchers to accurately appraise

D
the results. The aim of this study was to ata or record linkage is “a Different methods are used for standardising
develop and test reporting guidelines for process of pairing records from data and linking data sets and the choice of
evaluating the methodological quality of two files and trying to select the these may influence the quality of results.7,8
studies using linked data. pairs that belong to the same entity.”1 It A systematic review of the accuracy of
Method: The development process is commonly used to combine existing probabilistic linkage processes found the
included a literature review, a Delphi data sets to create more comprehensive sensitivity (i.e. the proportion of truly linked
process and a validation process.
information to conduct research. Studies records detected) ranged from 74% to 98%.9
Participants in the process were all
involving data linkage are becoming more The authors noted that this variation was likely
Australian and included biostatisticians,
common. The Australian Government to be due to the number and quality of fields
epidemiologists, registry administrators,
through the National Collaborative Research available for linkage.
academic clinicians and a peer-reviewed
Infrastructure Strategy awarded $20 million Where there is low sensitivity of linkage
journal editor.
to the Population Health Research Network processes, differential inaccuracies in the
Results: The final guidelines included
four domains and 14 reporting items.
(PHRN) to establish national capacity for data data may result in systemic bias. Linkage
These included: data sources (six items), linkage in Australia.2 The PHRN also received rates vary by participants’ age,10 gender,11,12
research selected variables (four items), more than $30 million in direct and indirect ethnicity,13 health status, regional location14,15
linkage technology and data analysis support from each of the states and territories. and socio-economic status.13 These variations
(three items), and ethics, privacy and data Data linkage will be conducted within may affect the conclusions of research
security (one item). nodes operating in each individual State3-5 studies.16
Conclusion: This study is the first to (see www.phrn.org.au), as in the model of Factors compromising the quality of
develop guidelines for appraising the Western Australia, with the Centre for Data studies using linked data should be reported
quality of reported data linkage studies. Linkage, a national network, to co-ordinate in a systematic way to allow readers and
Implications: These guidelines will the activities of the state-based groups. researchers to accurately appraise different
assist authors to report their results in Although the linkage of data sources studies. Reporting guidelines have improved
a consistent, high-quality manner. They generates valuable information and improves the quality of information in other areas of
will also assist readers to interpret the
data quality, it may create additional biases research by highlighting the shortcomings
quality of results derived from data linkage
and methodological concerns.6 To link data and prompting improvements in the quality
studies.
sets accurately requires stable and sufficiently of published studies.17 Currently, there is no
Key words: Data collection, medical
unique identifiers. However, unique identifiers tool available to appraise the quality of studies
record linkage, guideline, research design,
are not always available due to ethical and using data linkage.
peer review, research
privacy constraints.
Aust NZ J Public Health. 2011; 35:486-9
doi: 10.1111/j.1753-6405.2011.00741.x

Submitted: September 2010 Revision requested: March 2011 Accepted: April 2011
Correspondence to:
Dr Megan Bohensky, Centre for Research Excellence in Patient Safety, Department of
Epidemiology, Monash University, 99 Commercial Road, Level 6 Alfred Centre, Prahran,
Victoria 3181; e-mail: megan.bohensky@monash.edu

486 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 2011 vol. 35 no. 5
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia
Data Quality Reporting guidelines for studies involving data linkage

Because the different data linkage nodes in Australia are subject linkage studies. Twelve (60%) of the 20 invited experts agreed to
to different privacy legislation and organisational structures, there take part in the Delphi process. The disciplinary backgrounds of
is the potential that they may need to utilise differing methods and participants were: 17% biostatisticians, 33% epidemiologists, 17%
identifiers to link data in each jurisdiction. The development of registry administrators, 25% clinician academics, 8% computer
standardised guidelines for reporting and appraising the quality scientists and 8% journal editors.
of linkages within each node is timely and can help achieve The ratings from the first round were summarised quantitatively
greater national consistency in linkage results, especially where and qualitatively. The median group score, range and proportion
jurisdictional data will be aggregated and considered at a national of participants rating the item at eight or above were calculated for
scale. The Centre for Data Linkage is making concerted efforts to presentation in the second round. An a priori decision was made to
harmonise these data for national analyses. include items with a panel median score of eight or higher and with
This study aimed to develop and validate reporting guidelines for a high level of agreement based on the Rand/UCLA Appropriateness
evaluating the methodological quality of studies using linked data. Method User’s Manual for strict agreement (A7R).18 Inter-percentile
Range Adjusted for Symmetry (IPRAS) scores, which are a measure
of score dispersion adjusted for panel symmetry, were used to
Methods determine the level of agreement for each item.
A modified Delphi process was used to gain agreement about After round one, 20 of the 47 items were ranked within the
the contents of reporting guidelines.18 This method has been ‘included’ range according to Delphi criteria. In the second round,
employed in the development of reporting guidelines for other participants were asked to re-rate items taking into consideration
types of research studies, including the CONSORT statement for the findings from the first round. The final list of ‘accepted’ items
the reporting of randomised controlled trials.19 and ‘threshold’ items (where nine or more people rated eight or
There were three stages to the reporting guideline development above and there was a high level of agreement) were circulated to
process: 1) a literature review; 2) a Delphi process, incorporating participants for review and comment after the end of the second
an informant consultation process and two Delphi voting rounds; round. Following round two, 14 items remained.
and 3) a tool validation process.
Validation
The validation process randomly selected a sample of 25 from the
Literature review
75 eligible articles (impact factors ranging from 0 to 15.7 grouped
The literature review6 summarised articles that identified quality
into impact factor quintiles). The majority of articles were from
issues with data linkage studies published from 1991 to 2007.
Australia, North America, United Kingdom and Scandinavian
Thirty-three articles met the inclusion criteria from which four
countries. Two researchers (MB and CB) applied the guidelines to
domains and 26 items were identified that addressed issues of data
the de-identified articles to rate how well each item was reported
linkage reporting quality. The reasons for and the nature of bias
within the article (not applicable, poorly addressed, adequately
that arose from unlinked records were summarised and forwarded
addressed, well addressed).
to the participants in the consultation process.
The median number of items rated as ‘well addressed’ by at least
one reviewer in each article was six (range: 1-12). There was not
Consultation process
strong evidence of a relationship between impact factor and the
The key informant consultation process was conducted with 10
summary rating of items for each study (r=0.20). The proportion
experts selected through purposive sampling of Australian experts
agreement of the validation process was 71% and the kappa score
in a range of fields related to data linkage. Participants were asked
was 0.6. Domain-specific kappa scores were as follows: existing data
to review the domains summarised from the literature review and
sources k=0.4; researcher selected variables and data preparation
advise if additional domains and items, not previously identified,
k=0.5; technology and analysis of linked data k= 0.8; and ethical
should be included. Participants suggested a fifth domain focusing
review k=0.9.
solely on the variables to be used within the research study and an
Ethics approval for this study was received from the Monash
additional 21 items to be added to the preliminary list. The final
University Standing Committee on Ethical Research in Humans.
list of domains and items was pilot tested by three independent
researchers for face and content validity.
Results
Delphi process The Delphi consensus process identified and validated reporting
Two Delphi voting rounds were undertaken and participants’ guidelines including four domains and 14 reporting items (presented
identities were kept anonymous. in Table 1).
Before the Delphi voting rounds, all participants were given a The final list of items incorporated six (43%) items from the
background summary report of the project and literature review. domain on data sources, demonstrating the importance of having
The Delphi survey process included participants who had data high-quality existing data systems to conduct high-quality linkage
linkage experience as researchers, technicians or users of data research. Of the 14 items, four (29%) items were from the domain on

2011 vol. 35 no. 5 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 487
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia
Bohensky et al. Article

researcher-selected variables related to the researcher’s consideration able to establish the specificity of their linkage results (i.e. if a
of the quality of specific variables to be used. This domain also one-to-one link is not expected, it may be difficult to quantify the
demonstrates the importance of understanding the quality of existing number of false negatives). Nonetheless, the guidelines assist in
data sources and specific variables to be used for linked analysis. identifying where assumptions about the accuracy or quality of
There were three (21%) items included from the domain concerning data have been made.
the technology and analysis of the linked data and one (7%) item Given the systematic investigation, the expertise of the Delphi
from the domain on ethics, privacy and data security was included participants and broad disciplinary representation, this study offers
in the final list. an exploratory basis for developing an accepted list of reporting
Review of the written comments indicated that most participants criteria for studies using data linkage. The validation findings
who gave a low score to items in the domain on ethics, privacy demonstrate that the criteria considered important by the experts
and data security concluded that if the study had the approval of are not consistently reported in the literature, with a median of
an ethics committee they could assume that the other items had only six items reported in the selected studies. This highlights an
been addressed (e.g. security of data was maintained at all times important gap that should be addressed. The differences between the
and consent to the linkage was obtained or not required). This criteria considered important by the experts and those consistently
underscores the importance of ethical review and the governance reported in the literature may relate to the fact that many researchers
of research using linked data, especially if participant consent was utilising linked data are not familiar with the issues that can impact
not provided for the collection or linkage of the data initially. linkage quality, especially where data are linked by a third party,
IPRAS scores in the second round showed agreement on all of such as a data linkage centre. Having standardised guidelines will
the items that met the criteria for acceptance and high agreement help to highlight these concerns for researchers and readers of data
on low rated items. linkage studies.
There are several limitations of this study. First, Delphi methods
that include only anonymous voting components have been criticised
Discussion for not including expert group discussion. However, we modified this
This study is the first to develop guidelines for appraising quality process by including the initial group interactions before the voting.
issues in studies using data linkage. This is an important endeavour The high levels of agreement precluded the need for further direct
as more studies and reports using linked data are being published. discussions. Although the total number of participants in this project
It is expected that these guidelines could be utilised by authors, was small, all participants had publications in the area of expertise
data linkage analysts and reviewers as a basis for understanding and took part in both rounds of the process, so participant drop-out
the quality of their data sets, the linkage process and the possible did not influence our findings. The reliability of the application
limitations of the associated findings. The guidelines are intended of these guidelines was moderate with a kappa statistic of 0.6.20
to serve as a general framework. It is not expected that every item While the overall kappa score reflects moderate agreement, this is
will apply to each study. For example, many studies will not be consistent with other critical appraisal tools tested for reliability

Table 1: Reporting guidelines for studies using data linkage.


DOMAIN 1: The existing data sources to be linked (complete for each dataset to be linked in the study)
Dataset 1 Dataset 2 Dataset 3 Dataset 4
1. Purpose of the dataset was given
2. Description of the type of dataset (administrative/ clinical registry/ research study) was specified
3. Any standardised coding system/data dictionary used should was stated
4. % population coverage
5. Data collection methods were described
6. Data quality assurance process described, including the frequency of checks
DOMAIN 2: Researcher-selected variables and data preparation
7. Were the participant inclusion criteria specified?

8. Were the variables used for linkage stated (including rates of missing data)?
9. Were changes to the coding systems reported (including changes over time or revisions to disease/risk factor definitions revised during the study period)?
10. Were potential sources of bias adequately described?
DOMAIN 3: Technology/linkage process
11. Was the intended precision of the linkage stated?
12. Was a description of the linkage method given (e.g. deterministic, probabilistic methods including use of blocking and phonetic coding, if used) and a
justification for the use of this type of linkage provided?
13. Was a measure of the quality of the linked data-sets provided (e.g. % linked records, false positive/false negative rates)?
DOMAIN 4: Ethics, privacy, data protection and access arrangements
14. Did the study receive approval from a human research ethics committee?

488 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 2011 vol. 35 no. 5
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia
Data Quality Reporting guidelines for studies involving data linkage

(inter-rater agreement 0.6 to 0.7, ranging from 0.12 to 0.95).21,22 References


The poorest agreement was in the domain on existing data 1. Winglee M, Valliant R, Scheuren F. A case study in record linkage. Surv
Methodol. 2005;31:3-11.
sources, which may reflect the number of items and complexity 2. National Collaborative Research Infrastructure Strategy. Strategic Roadmap
of this domain, especially if studies involved more than two data for Australian Research Infrastructure. Canberra (AUST): Commonwealth
Department of Innovation, Industry, Science and Research; 2008.
sources. The domain which considered whether ethics approval had
3. Holman CDA, Bass AJ, Rouse IL, Hobbs MST. Population-based linkage of
been received had the highest level of agreement. The use of the health records in Western Australia: development of a health services research
Delphi process ensures a high degree of face validity. linked database. Aust N Z J Public Health. 1999;23:453-9.
4. Centre for Health Record Linkage. Guide To Health Record Linkage Services.
A relationship was not found between impact factor and the Version 1.3; Sydney (AUST): Cancer Institute NSW; undated.
number of items that were considered to be well addressed, but 5. Deapartment of Health. Victorian Data Linkages [Internet]. Melbourne (AUST):
State Government of Victoria; 2011 [cited 2011 Mar]. Available from: http://
impact factor is a debated proxy measure for scientific quality.23 www.health.vic.gov.au/vdl/index.htm
This study may also have been under-powered to detect such a 6. Bohensky M, Jolley D, Sundararajan V, Evans S, Scott I, Brand C. Data linkage:
a powerful tool with potential problems. BMC Health Serv Res. 2010;10:346.
relationship. Nevertheless, the tool is intended to serve as a general 7. Yancey WE. Evaluating string comparator performance for record linkage. In:
guide to authors and reviewers in relation to aspects of their study Research Report Series (Statistics #2005-05) [Internet]. Washington (DC): U.S.
Census Bureau, Statistical Research Division; 2005 [cited 2010 Nov]. Available
of data linkage. It is likely that other topical content will also be from: http://www.census.gov/srd/papers/pdf/rrs2005-05.pdf
taken into consideration during the authorship and review of the 8. Gomatam S, Carter R, Ariet M, Mitchell G. An empirical comparison of record
linkage procedures. Stat Med. 2002;21:1485-96.
article, and high-quality linkage methods do not necessarily imply 9. Silveira DP, Artmann E. Accuracy of probabilistic record linkage applied to
that the research is of high scientific merit. health databases: systematic review. Rev Saude Publica. 2009;43:875-82.
10. Li B, Quan H, Fong A, Lu M. Assessing record linkage between health care
As we restricted the process to 12 participants, this process could and Vital Statistics databases using deterministic methods. BMC Health Serv
be expanded in the future to include participants from outside Res. 2006;6:48.
11. Bopp M, Minder CE. Mortality by education in German speaking Switzerland,
Australia and public health disciplines.
1990-1997: Results from the Swiss National Cohort. Int J Epidemiol.
2003;32:346-54.
12. Dunn KM, Jordan K, Lacey RJ, Shapley M, Jinks C. Patterns of consent in
epidemiologic research: evidence from over 25,000 responders. Am J Epidemiol.
Conclusion 2004;159:1087-94.
This study is the first to develop reporting guidelines for 13. Adams MM, Wilson HG, Casto DL, et al. Constructing Reproductive Histories
by Linking Vital Records. Am J Epidemiol. 1997;145:339-48.
appraising the quality of studies using data linkage. It is fundamental 14. Gyllstrom ME, Jensen JL, Vaughan JN, Castellano SE, Oswald JW. Linking
that high quality and systematic data linkage methods be supported birth certificates with Medicaid data to enhance population health assessment:
methodological issues addressed. J Public Health Manag Pract. 2002;8:38-44.
to advance research. As studies relying on linked data are becoming 15. Hoving JL, Monaco A, MacFarlane E, et al. Methodological issues in linking
more prevalent in the Australian research community with the study participants to Australian cancer registries using different methods:
lessons from a cohort study. Aust N Z J Public Health. 2005;29:378-82.
development of national data linkage capacity, these guidelines may 16. O’Reilly D, Rosato M, Connolly S. Unlinked vital events in census-based
assist authors and reviewers in producing high-quality research. longitudinal studies can bias subsequent analysis. J Clin Epidemiol.
2008;61:380-5.
17. Plint AC, Moher D, Morrison A, et al. Does the CONSORT checklist improve
the quality of reports of randomised controlled trials? A systematic review.
Acknowledgements Med J Aust. 2006;185:263-7.
18. Fitch K, Bernstein SJ, Aguilar MS, et al. The RAND/UCLA Appropriateness
The authors gratefully acknowledge the members of the Delphi Method User’s Manual [Internet]. Santa Monica (CA): Rand; 2001 [cited 2010
Sep]. Available from: http://www.rand.org/pubs/monograph_reports/MR1269/
panel and their colleagues Dr Cameron Willis, Sacha Höttje and index.html
Christine Moje for their helpful review and feedback on drafts of 19. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised
recommendations for improving the quality of reports of parallel-group
this manuscript. Megan Bohensky received funding for her PhD randomised trials. Lancet. 2001;357:1191-4.
through an Australian Postgraduate Award. 20. Landis JR, Koch GG. The measurement of observer agreement for categorical
data. Biometrics. 1977;33:159-74.
21. Hartling L, Ospina M, Liang Y, et al. Risk of bias versus quality assessment of
randomised controlled trials: cross sectional study. BMJ. 2009;339:b4012.
22. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the
quality of randomized controlled trials: an annotated bibliography of scales
and checklists. Control Clin Trials. 1995;16:62-73.
23. Grzybowski A. The journal impact factor: how to interpret its true value and
importance. Med Sci Monit. 2009;15:SR1-4.

2011 vol. 35 no. 5 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 489
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia

You might also like