Professional Documents
Culture Documents
Article
D
the results. The aim of this study was to ata or record linkage is “a Different methods are used for standardising
develop and test reporting guidelines for process of pairing records from data and linking data sets and the choice of
evaluating the methodological quality of two files and trying to select the these may influence the quality of results.7,8
studies using linked data. pairs that belong to the same entity.”1 It A systematic review of the accuracy of
Method: The development process is commonly used to combine existing probabilistic linkage processes found the
included a literature review, a Delphi data sets to create more comprehensive sensitivity (i.e. the proportion of truly linked
process and a validation process.
information to conduct research. Studies records detected) ranged from 74% to 98%.9
Participants in the process were all
involving data linkage are becoming more The authors noted that this variation was likely
Australian and included biostatisticians,
common. The Australian Government to be due to the number and quality of fields
epidemiologists, registry administrators,
through the National Collaborative Research available for linkage.
academic clinicians and a peer-reviewed
Infrastructure Strategy awarded $20 million Where there is low sensitivity of linkage
journal editor.
to the Population Health Research Network processes, differential inaccuracies in the
Results: The final guidelines included
four domains and 14 reporting items.
(PHRN) to establish national capacity for data data may result in systemic bias. Linkage
These included: data sources (six items), linkage in Australia.2 The PHRN also received rates vary by participants’ age,10 gender,11,12
research selected variables (four items), more than $30 million in direct and indirect ethnicity,13 health status, regional location14,15
linkage technology and data analysis support from each of the states and territories. and socio-economic status.13 These variations
(three items), and ethics, privacy and data Data linkage will be conducted within may affect the conclusions of research
security (one item). nodes operating in each individual State3-5 studies.16
Conclusion: This study is the first to (see www.phrn.org.au), as in the model of Factors compromising the quality of
develop guidelines for appraising the Western Australia, with the Centre for Data studies using linked data should be reported
quality of reported data linkage studies. Linkage, a national network, to co-ordinate in a systematic way to allow readers and
Implications: These guidelines will the activities of the state-based groups. researchers to accurately appraise different
assist authors to report their results in Although the linkage of data sources studies. Reporting guidelines have improved
a consistent, high-quality manner. They generates valuable information and improves the quality of information in other areas of
will also assist readers to interpret the
data quality, it may create additional biases research by highlighting the shortcomings
quality of results derived from data linkage
and methodological concerns.6 To link data and prompting improvements in the quality
studies.
sets accurately requires stable and sufficiently of published studies.17 Currently, there is no
Key words: Data collection, medical
unique identifiers. However, unique identifiers tool available to appraise the quality of studies
record linkage, guideline, research design,
are not always available due to ethical and using data linkage.
peer review, research
privacy constraints.
Aust NZ J Public Health. 2011; 35:486-9
doi: 10.1111/j.1753-6405.2011.00741.x
Submitted: September 2010 Revision requested: March 2011 Accepted: April 2011
Correspondence to:
Dr Megan Bohensky, Centre for Research Excellence in Patient Safety, Department of
Epidemiology, Monash University, 99 Commercial Road, Level 6 Alfred Centre, Prahran,
Victoria 3181; e-mail: megan.bohensky@monash.edu
486 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 2011 vol. 35 no. 5
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia
Data Quality Reporting guidelines for studies involving data linkage
Because the different data linkage nodes in Australia are subject linkage studies. Twelve (60%) of the 20 invited experts agreed to
to different privacy legislation and organisational structures, there take part in the Delphi process. The disciplinary backgrounds of
is the potential that they may need to utilise differing methods and participants were: 17% biostatisticians, 33% epidemiologists, 17%
identifiers to link data in each jurisdiction. The development of registry administrators, 25% clinician academics, 8% computer
standardised guidelines for reporting and appraising the quality scientists and 8% journal editors.
of linkages within each node is timely and can help achieve The ratings from the first round were summarised quantitatively
greater national consistency in linkage results, especially where and qualitatively. The median group score, range and proportion
jurisdictional data will be aggregated and considered at a national of participants rating the item at eight or above were calculated for
scale. The Centre for Data Linkage is making concerted efforts to presentation in the second round. An a priori decision was made to
harmonise these data for national analyses. include items with a panel median score of eight or higher and with
This study aimed to develop and validate reporting guidelines for a high level of agreement based on the Rand/UCLA Appropriateness
evaluating the methodological quality of studies using linked data. Method User’s Manual for strict agreement (A7R).18 Inter-percentile
Range Adjusted for Symmetry (IPRAS) scores, which are a measure
of score dispersion adjusted for panel symmetry, were used to
Methods determine the level of agreement for each item.
A modified Delphi process was used to gain agreement about After round one, 20 of the 47 items were ranked within the
the contents of reporting guidelines.18 This method has been ‘included’ range according to Delphi criteria. In the second round,
employed in the development of reporting guidelines for other participants were asked to re-rate items taking into consideration
types of research studies, including the CONSORT statement for the findings from the first round. The final list of ‘accepted’ items
the reporting of randomised controlled trials.19 and ‘threshold’ items (where nine or more people rated eight or
There were three stages to the reporting guideline development above and there was a high level of agreement) were circulated to
process: 1) a literature review; 2) a Delphi process, incorporating participants for review and comment after the end of the second
an informant consultation process and two Delphi voting rounds; round. Following round two, 14 items remained.
and 3) a tool validation process.
Validation
The validation process randomly selected a sample of 25 from the
Literature review
75 eligible articles (impact factors ranging from 0 to 15.7 grouped
The literature review6 summarised articles that identified quality
into impact factor quintiles). The majority of articles were from
issues with data linkage studies published from 1991 to 2007.
Australia, North America, United Kingdom and Scandinavian
Thirty-three articles met the inclusion criteria from which four
countries. Two researchers (MB and CB) applied the guidelines to
domains and 26 items were identified that addressed issues of data
the de-identified articles to rate how well each item was reported
linkage reporting quality. The reasons for and the nature of bias
within the article (not applicable, poorly addressed, adequately
that arose from unlinked records were summarised and forwarded
addressed, well addressed).
to the participants in the consultation process.
The median number of items rated as ‘well addressed’ by at least
one reviewer in each article was six (range: 1-12). There was not
Consultation process
strong evidence of a relationship between impact factor and the
The key informant consultation process was conducted with 10
summary rating of items for each study (r=0.20). The proportion
experts selected through purposive sampling of Australian experts
agreement of the validation process was 71% and the kappa score
in a range of fields related to data linkage. Participants were asked
was 0.6. Domain-specific kappa scores were as follows: existing data
to review the domains summarised from the literature review and
sources k=0.4; researcher selected variables and data preparation
advise if additional domains and items, not previously identified,
k=0.5; technology and analysis of linked data k= 0.8; and ethical
should be included. Participants suggested a fifth domain focusing
review k=0.9.
solely on the variables to be used within the research study and an
Ethics approval for this study was received from the Monash
additional 21 items to be added to the preliminary list. The final
University Standing Committee on Ethical Research in Humans.
list of domains and items was pilot tested by three independent
researchers for face and content validity.
Results
Delphi process The Delphi consensus process identified and validated reporting
Two Delphi voting rounds were undertaken and participants’ guidelines including four domains and 14 reporting items (presented
identities were kept anonymous. in Table 1).
Before the Delphi voting rounds, all participants were given a The final list of items incorporated six (43%) items from the
background summary report of the project and literature review. domain on data sources, demonstrating the importance of having
The Delphi survey process included participants who had data high-quality existing data systems to conduct high-quality linkage
linkage experience as researchers, technicians or users of data research. Of the 14 items, four (29%) items were from the domain on
2011 vol. 35 no. 5 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 487
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia
Bohensky et al. Article
researcher-selected variables related to the researcher’s consideration able to establish the specificity of their linkage results (i.e. if a
of the quality of specific variables to be used. This domain also one-to-one link is not expected, it may be difficult to quantify the
demonstrates the importance of understanding the quality of existing number of false negatives). Nonetheless, the guidelines assist in
data sources and specific variables to be used for linked analysis. identifying where assumptions about the accuracy or quality of
There were three (21%) items included from the domain concerning data have been made.
the technology and analysis of the linked data and one (7%) item Given the systematic investigation, the expertise of the Delphi
from the domain on ethics, privacy and data security was included participants and broad disciplinary representation, this study offers
in the final list. an exploratory basis for developing an accepted list of reporting
Review of the written comments indicated that most participants criteria for studies using data linkage. The validation findings
who gave a low score to items in the domain on ethics, privacy demonstrate that the criteria considered important by the experts
and data security concluded that if the study had the approval of are not consistently reported in the literature, with a median of
an ethics committee they could assume that the other items had only six items reported in the selected studies. This highlights an
been addressed (e.g. security of data was maintained at all times important gap that should be addressed. The differences between the
and consent to the linkage was obtained or not required). This criteria considered important by the experts and those consistently
underscores the importance of ethical review and the governance reported in the literature may relate to the fact that many researchers
of research using linked data, especially if participant consent was utilising linked data are not familiar with the issues that can impact
not provided for the collection or linkage of the data initially. linkage quality, especially where data are linked by a third party,
IPRAS scores in the second round showed agreement on all of such as a data linkage centre. Having standardised guidelines will
the items that met the criteria for acceptance and high agreement help to highlight these concerns for researchers and readers of data
on low rated items. linkage studies.
There are several limitations of this study. First, Delphi methods
that include only anonymous voting components have been criticised
Discussion for not including expert group discussion. However, we modified this
This study is the first to develop guidelines for appraising quality process by including the initial group interactions before the voting.
issues in studies using data linkage. This is an important endeavour The high levels of agreement precluded the need for further direct
as more studies and reports using linked data are being published. discussions. Although the total number of participants in this project
It is expected that these guidelines could be utilised by authors, was small, all participants had publications in the area of expertise
data linkage analysts and reviewers as a basis for understanding and took part in both rounds of the process, so participant drop-out
the quality of their data sets, the linkage process and the possible did not influence our findings. The reliability of the application
limitations of the associated findings. The guidelines are intended of these guidelines was moderate with a kappa statistic of 0.6.20
to serve as a general framework. It is not expected that every item While the overall kappa score reflects moderate agreement, this is
will apply to each study. For example, many studies will not be consistent with other critical appraisal tools tested for reliability
8. Were the variables used for linkage stated (including rates of missing data)?
9. Were changes to the coding systems reported (including changes over time or revisions to disease/risk factor definitions revised during the study period)?
10. Were potential sources of bias adequately described?
DOMAIN 3: Technology/linkage process
11. Was the intended precision of the linkage stated?
12. Was a description of the linkage method given (e.g. deterministic, probabilistic methods including use of blocking and phonetic coding, if used) and a
justification for the use of this type of linkage provided?
13. Was a measure of the quality of the linked data-sets provided (e.g. % linked records, false positive/false negative rates)?
DOMAIN 4: Ethics, privacy, data protection and access arrangements
14. Did the study receive approval from a human research ethics committee?
488 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 2011 vol. 35 no. 5
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia
Data Quality Reporting guidelines for studies involving data linkage
2011 vol. 35 no. 5 AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH 489
© 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia