Professional Documents
Culture Documents
Rooney Et Al 2014 Systematic Review and Evidence Integration For Literature Based Environmental Health Science
Rooney Et Al 2014 Systematic Review and Evidence Integration For Literature Based Environmental Health Science
Results performed. The search covers multiple data- that studies considered in the evaluation have
The OHAT framework is a flexible seven-step bases (including, but not limited to, PubMed, been reviewed by subject-matter experts, and
process (Figure 1) tailored to the complex- TOXNET, Scopus, and Embase) with suf- the information from this review would be
ity of the research question. It includes all of ficient details of the search strategy docu- available in step 4 when evaluating individual
the recommended elements for conducting mented in the protocol such that it could be study quality.
and reporting a systematic review [outlined reproduced. The protocol also lists the dates Select studies for inclusion. All references
in the PRISMA (Preferred Reporting Items of the search, frequency of updates, and any identified in the search are screened for rele
for Systematic Reviews and Meta-Analyses) limits placed on the search (e.g., language, vance to the key question(s) of the evalua-
statement (Moher et al. 2009)]. The specific date of publication). The protocol estab- tion based on the PECOTS eligibility criteria
procedures for performance of each step are lishes requirements for consideration of data established when formulating the problem
described in a detailed protocol developed for from meeting abstracts or other unpublished in step 1. The protocol establishes criteria for
each evaluation (NTP 2013a, 2013f). sources. If a study that may be critical to the including or excluding references based on,
evaluation has not been peer reviewed and for example, applicable outcomes, relevant
Step 1: Problem Formulation and the authors agree to make all study materials exposures, and types of studies. These criteria
Protocol Development available, the NTP will have it peer reviewed contain sufficient detail to develop an inclu-
Prior to conducting an evaluation, the scope by independent scientists with relevant sion and exclusion checklist in order to limit
and focus of the topic is defined through expertise. The peer-review requirement assures the use of scientific judgment during the
consultation with subject-matter experts. For
OHAT, the objective is typically to identify Step 1: Problem formulation and protocol development
a potential health hazard or assess the state
of the science in order to identify research Step 2: Search for and select studies for inclusion
needs on topics of importance to environ-
mental health. The objectives of the evalua- Step 3: Extract data from studies
tion must be clearly stated, including the key
questions to be addressed. The evaluation is Step 4: Assess quality of individual studies
structured to answer these key questions that
guide the systematic-review process for the Step 5: Rate confidence in the body of evidence
literature search, study selection, data extrac- Initial confidence
by key features
Factors Factors Confidence
decreasing increasing in the body
tion, and synthesis. The questions define of study design confidence confidence of evidence
the populations, exposures, comparators, High (++++) • Risk of bias • Large magnitude of effect
outcomes, timings, and settings of interest 4 Features
Features • Unexplained
• Dose response High (++++)
• Residual confounding
(PECOTS) eligibility criteria for the evalua • Controlled
exposure
inconsistency –Studies report an effect and residual
confounding is toward null
tion (e.g., see discussion in AHRQ 2013). Moderate (+++)
3 Features
• Exposure
prior to
• Indirectness –Studies report no effect and residual
Moderate (+++)
than exposures, and did not initially include If the only available body of evidence receives a “very low confidence” rating,
timing or setting in the inclusion criteria then conclusions for those outcomes will not move on to Step 6
(Whitlock et al. 2010). Step 6: Translate confidence ratings into evidence of health effects
A concept document (or brief proposal)
and a specific, detailed protocol for OHAT Health effect No health effect
ticipated issues that might arise while con- data may provide
strong support
ducting the review (e.g., see Food and Drug to increase
hazard ID
Search for studies. A comprehensive Figure 1. The OHAT Approach for systematic review and evidence integration for literature-based
search of the primary scientific literature is environmental health science assessments.
literature-selection process. If major limitations 2013 for detailed definitions). Study quality Assessment, Development and Evaluation
in a specific study type or design for address- assessment tools that mix different aspects of (GRADE) Working Group guidelines
ing the question are known in advance (e.g., study quality or provide a single summary (GRADE 2014), which have been adopted
unreliable methods to assess exposure or health score are discouraged (Balshem et al. 2011; by the Cochrane Collaboration (Schünemann
outcome), the basis for excluding those studies Higgins and Green 2011; Liberati et al. 2009; et al. 2012) and AHRQ approaches (Balshem
must be described a priori in the protocol. Viswanathan et al. 2012). et al. 2011; Lohr 2012), which are conceptu-
The protocol also outlines the specific plans The OHAT risk-of-bias tool adapts guid- ally very similar. The OHAT method uses
for reviewing studies for inclusion, resolving ance from the AHRQ (Viswanathan et al. four descriptors to indicate the level of con-
conflicts between reviewers, and document- 2012). Individual risk-of-bias questions are fidence in the separate bodies of evidence
ing the reasons that studies were excluded. designated as applicable only to certain types (Table 2). In the context of identifying
Two reviewers independently screen all refer- of study designs (e.g., human controlled tri- research needs, a conclusion of “high con-
ences at the title and abstract level and resolve als, experimental animal studies, cohort fidence” indicates that further research is
differences by reaching agreement through studies, case–control studies, cross-sectional very unlikely to change the confidence in the
discussion. References that meet the inclu- studies, case series or case reports), with a sub- apparent relationship between exposure to
sion criteria are retrieved for full text review, set of the questions applying to each study the substance and the outcome. Conversely,
as are those with insufficient information to design (Table 1). a conclusion of “very low confidence” sug-
determine eligibility from just the title and Published tools do not address risk-of-bias gests that further research is very likely to
abstract. Procedures for full text review are criteria for animal studies because risk-of-bias impact confidence in the apparent relation-
tailored to the scope of the review and fol- tools, as with systematic-review methods in ship. Human and nonh uman animal data
low procedures established in the protocol. general, have been focused on guidelines for are considered separately throughout Steps 5
Creating a flow diagram to show the number clinical medicine. OHAT evaluates risk of and 6. Conclusions developed in the sub
of references retrieved, duplicates removed, bias in experimental animal studies using cri- sequent steps of the approach are based on
and studies excluded as references move teria similar to those applied to human ran- the evidence with the highest confidence.
through the screening process is one of sev- domized controlled trials, because these study For each outcome, studies are given an
eral required elements for reporting based on designs are similar in their ability to control initial confidence rating that reflects the
the PRISMA statement (Liberati et al. 2009; timing and dose of exposure and to minimize presence or absence of key study-design fea-
Moher et al. 2009) that we have included in the impact of confounding factors. Using tures (Figure 1, step 5). Then studies that
this framework. the same set of questions for all study types, have the same number of features are consid-
including experimental animal studies, allows ered together as a group to begin the process
Step 3: Extract Data from Studies for comparison of particular risk-of-bias issues of rating confidence in a body of evidence
Relevant data from individual studies selected across a body of evidence and facilitates com- for that outcome. The initial rating of each
for inclusion are extracted or copied from the parison of the strengths and weaknesses of group is downgraded for factors that decrease
publication to a database to facilitate criti- different bodies of evidence. confidence and upg raded for factors that
cal evaluation of the results, including data All references are independently assessed increase confidence in the results. Confidence
summary and display using separate data col- for risk of bias for each outcome of inter- across all studies with the same outcome
lection forms for human, animal, and in vitro est by two reviewers who answer all of the is then assessed by considering the ratings
studies. For each study, one member of the applicable questions with one of four options for all groups of studies with that outcome,
evaluation team performs the data extraction, (definitely low, probably low, probably high, and the highest rating for that outcome
and quality assurance procedures are under- or definitely high risk of bias) (CLARITY moves forward.
taken as specified in the protocol (e.g., review Group at McMaster University 2013) follow- Although confidence ratings for each out-
and confirmation by another team member). ing prespecified criteria detailed in the proto- come are developed for groups of studies, the
Following completion of an evaluation, the col. Before proceeding with the risk-of-bias number of studies constituting the group will
data extracted and summarized will be made assessment, OHAT recommends evaluating vary, and in some cases this group may be
publicly available in the NTP Chemical a small subset of studies as a “pilot” to clar- represented by only one study. Therefore, it
Effects in Biological Systems (CEBS) database ify how the protocol-specific criteria will be is worth noting that a single well-conducted
(NTP 2014a). applied through dialogue among subject mat- study may provide evidence of toxicity or a
ter experts and reviewers. During completion health effect associated with exposure to the
Step 4: Assess the Quality or Risk of of the risk-of-bias assessment for the full set of substance in question [e.g., see Germolec
Bias of Individual Studies studies, discrepancies between the reviewers (2009) and Foster (2009) for explanations
Despite the critical importance of assessing are resolved by reaching agreement through of the NTP levels of evidence for determi-
the credibility of individual studies when discussion. nation of “toxicity” for individual studies].
developing literature-based evaluations, the If a sufficient body of very similar studies is
meaning of the term “quality” varies widely Step 5: Rate the Confidence in the available, a quantitative meta-analysis may
across the fields of systematic review, toxi- Body of Evidence be completed to generate an overall estimate
cology, and public health (see discussion in For each outcome, the confidence in the of effect, but this is not required. Finally,
Viswanathan et al. 2012). Broadly defined, body of evidence is rated by considering the confidence conclusions are developed across
study quality includes a) reporting qual- strengths and weaknesses of a collection of multiple outcomes for those outcomes that
ity (how well or completely a study was studies with similar study design features. are biologically related.
reported); b) internal validity or risk of bias Ratings reflect confidence that the study find- It is recognized that the scientific judg-
(how credible the findings are based on the ings accurately reflect the true association ments involved in developing these confidence
design and apparent conduct of a study); and between exposure and effect including aspects ratings are inherently subjective. A key advan-
c) external validity or directness and applica- of external validity (or directness and applica- tage of the systematic-review process for this
bility (how well a study addresses the topic bility) for the studies. The OHAT method is step and throughout an evaluation is that it
under review) (see Cochrane Collaboration based on the Grading of Recommendations provides a framework to document and justify
the decisions made, and thereby provides for in the confidence that exposure preceded and b) the exposure assessment represents expo-
greater transparency in the scientific basis of was associated with the outcome (Figure 1, sures occurring prior to development of the
judgments made in reaching conclusions. step 5). This ability is reflected in the presence outcome; c) the outcome is assessed on the
Initial confidence set by key features of or absence of four key study-design features individual level (i.e., not population aggregate
study design for each outcome. An initial confi- that determine initial confidence ratings, and data); and d) a comparison or control group
dence rating is determined by the ability of the studies are differentiated based on whether is used within the study. The first key feature,
study design to address causality as reflected a) the exposure to the substance is controlled; “controlled exposure,” reflects the ability of
study design. bHCTs are carried out in humans using a controlled exposure, including randomized controlled trials and non-randomized experimental studies. cCoh studies include
prospective studies that follow subjects free of disease over time or retrospective studies of subjects with prior information available. dCaC studies enroll subjects based on their
disease status and compare exposures across the groups. eCrS studies are conducted at one point in time and include population surveys with individual data [e.g., National Health
and Nutrition Examination Survey (NHANES)] and population surveys with aggregate data (i.e., air pollution exposure estimated by ZIP code). fAll applies to ExA, HCT, Coh, CaC, and
CrS studies, as well as other study design types such as case reports or CaS studies that lack a comparison group within the study.
experimental studies in humans and animals the body of evidence. Large inconsistency of evidence, and selective reporting within
to largely eliminate confounding by random- across studies should be explored, preferably a study is covered in the risk-of-bias criteria
izing allocation of exposure. Therefore, these through a priori hypotheses that might explain addressing these limitations (Guyatt et al.
studies will usually have all four features and the heterogeneity. 2011d). Funnel plots provide a useful tool to
receive an initial rating of “high confidence.” Indirectness. Indirectness can refer to visualize asymmetrical or symmetrical patterns
Observational studies do not have controlled external validity or indirect measures of the of study results for assessing publication bias
exposure and are differentiated by the presence health outcome. Indirectness can lower con- when there is a sufficient body of studies for
or absence of the three remaining study-design fidence in the body of evidence when the a specific outcome (e.g., Ahmed et al. 2012).
features. For example, prospective cohort stud- population, exposure, or outcome(s) measured There is empirical evidence that studies with
ies usually have all three remaining features differs from the population, exposure, or negative results (null findings for clinical trials)
and receive an initial rating of “moderate con- outcome(s) that is of most interest. Concerns are less likely to be in the published literature
fidence,” whereas a case report may have only about directness could apply to the relation- (Hopewell et al. 2009). Negative studies may
one key feature and receive an initial rating ship between a) a measured outcome and a also be affected by “lag bias” or longer time to
of “very low confidence” (see Supplemental health effect (i.e., upstream biomarker of a publication (Stern and Simes 1997); therefore,
Material, Table S2, for key features for stan- health effect); b) the route of exposure and the it is important to carefully consider data sets
dard study designs and discussion, pp. 9–11). typical human exposure; c) the study popula- that are limited to few positive studies with
The presence or absence of these study-design tion and the population of interest (Guyatt small sample size that might indicate a lag
features capture and discriminate studies on et al. 2011c; Lohr 2012); d) the timing of the time between early positive studies and lagging
an outcome-specific basis (e.g., experimental, exposure relative to the appropriate biological negative studies. Although some publication
prospective) but do not replace consideration window to affect the outcome; or e) the timing bias is expected, downgrading is reserved for
of risk of bias elements or external validity in of outcome assessment and the length of time cases in which serious concern for publication
other steps. required after an exposure for development of bias significantly decreases confidence in the
Downgrade confidence rating. Five the outcome (Viswanathan et al. 2012). body of evidence.
properties of the body of evidence (risk of The administered dose or exposure level Upgrade confidence rating. Four proper-
bias, unexplained inconsistency, indirectness, is not considered a factor under indirectness ties of the body of evidence (large magnitude
imprecision, and publication bias) are consid- for developing a confidence rating for the of effect, dose response, residual confound-
ered to determine if the initial confidence rat- purpose of hazard identification. Although ing increases confidence, and cross-species/
ing should be downgraded (Figure 1, step 5). exposure level is an important factor in con- population/study consistency) are considered
For each of the five properties, a judgment sidering the relevance of study findings to to determine if the confidence rating should
is made and documented regarding whether human health effects at known human expo- be upgraded (Figure 1, step 5). For each of the
there are substantial issues that decrease the sure levels, in the OHAT evaluation process, four properties, a judgment is made and docu
confidence rating in each aspect of the body this consideration occurs after hazard identifi- mented regarding whether or not there are
of evidence for the outcome. Factors that cation as part of reaching a “level of concern” substantial factors that increase the confidence
would downgrade confidence by one versus conclusion (Jahnke et al. 2005; Medlin 2003; rating in the body of evidence for the outcome.
two levels are specified in the protocol. The Shelby 2005; Twombly 1998). The accuracy As discussed above for downgrading, two bor-
reasons for downgrading confidence may not of an exposure metric (e.g., market basket derline upgrades could be combined for one
fit neatly into a single property of the body survey vs. individual blood levels of a sub- upgrade and the body should not be upgraded
of evidence. If the decision to downgrade is stance) is also not considered a factor under twice for essentially the same attribute. Factors
borderline for two properties, the body of indirectness, and the confidence in the expo- that would upgrade confidence by one versus
evidence is downgraded once to account for sure assessment is considered in the risk-of- two levels are specified in the protocol.
both partial concerns. Similarly, the body of bias evaluation of individual studies on an Large magnitude of effect. A large magni-
evidence is not downgraded twice for what outcome basis in step 4. tude of effect is defined as an observed effect
is essentially the same limitation that could Imprecision. Imprecision is the lack of that is sufficiently large so that it is unlikely to
be considered applicable to more than one certainty in an estimate of effect for a specific have occurred as a result of bias from potential
property of the body of evidence. outcome. A precise estimate enables the evalua confounding factors.
Risk of bias of the body of evidence. Risk- tor to determine whether there is an effect (i.e., Dose response. A plausible dose–response
of-bias criteria were described in step 4 in it is different from the comparison group). relationship between the level of exposure and
which study-quality issues for individual stud- Confidence intervals for the estimates of effect the outcome increases confidence in the result
ies are evaluated on an outcome-specific basis. provide the primary evidence used in consider- because it reduces concern that the result
In step 5, the previous risk-of-bias assessments ing the imprecision of the body of evidence could be due to chance. In addition to con-
for individual studies now serve as the basis (Guyatt et al. 2011b). sidering dose response within a study with
for an overall risk-of-bias conclusion for the Publication bias. Publication bias is a range of exposure levels, consideration of
entire body of evidence. Downgrading for addressed specifically in rating the body multiple studies with varied exposure levels
risk of bias should reflect the entire body of Table 2. Confidence ratings in the bodies of evidence.
studies; therefore, the decision to downgrade
Confidence rating Definition
should be applied conservatively. The decision
to downgrade should be reserved for cases for High confidence (++++) High confidence in the association between exposure to the substance and the
outcome. The true effect is highly likely to be reflected in the apparent relationship.
which there is substantial risk of bias across Moderate confidence (+++) Moderate confidence in the association between exposure to the substance and the
most of the studies composing the body of outcome. The true effect may be reflected in the apparent relationship.
evidence (Guyatt et al. 2011e). Low confidence (++) Low confidence in the association between exposure to the substance and the
Unexplained inconsistency. Inconsistency, outcome. The true effect may be different from the apparent relationship.
or large variability in the magnitude or direc- Very low confidence (+) Very low confidence in the association between exposure to the substance and
tion of estimates of effect across studies that the outcome. The true effect is highly likely to be different from the apparent
cannot be explained, reduces confidence in relationship.
can contribute to an overall picture of the dose based on key design features of the available The strategy uses four terms to describe the
response. It is important to recognize that studies for a given outcome (e.g., for experi- level of evidence for health effects. These
prior knowledge may lead to an expectation mental studies separately from observational descriptors reflect both the confidence in the
for a nonmonotonic dose response. Therefore, studies). The studies with the highest con- body of evidence for a given outcome and the
the plausibility of the observed biological fidence rating form the basis of the confi- direction of effect. Three descriptors used in
response should be considered in evaluating dence conclusion for each evidence stream. As step 6 (“high level of evidence,” “moderate
the dose–response relationship. noted above, consistent results across studies level of evidence,” and “low level of evidence”)
Residual confounding increases confi- with different design features increase confi- directly translate from the confidence-in-the-
dence. This element refers to consideration of dence in the combined body of evidence and evidence ratings that exposure to the sub-
residual confounding, healthy worker effect, or can result in an upgraded confidence rating stance is associated with a heath effect, and
effect modification that would bias the effect moving forward to step 6. If the only avail- a fourth designation (“evidence of no health
estimate toward the null. If a study reports an able body of evidence receives a “very low effect”) indicates confidence that the sub-
effect or association despite the presence of confidence” rating, then conclusions for those stance is not associated with a health effect (for
residual confounding that would diminish the outcomes will not move on to step 6. definitions of the level of evidence for health
association, confidence in the association is After confidence conclusions are devel- effects descriptors, see Supplemental Material,
increased. This confounding can push in either oped for a given outcome, conclusions for Table S3). Because of the inherent difficulty in
direction; therefore, confidence in the results multiple outcomes are developed. The project- proving a negative, the conclusion “evidence
are increased when there is an indication that specific definition of an outcome and the of no health effect” is reached only when there
a body of evidence is potentially biased by grouping of biologically related outcomes used is high confidence in the body of evidence. In
factors counter to the observed effect. in this step follow the definitions developed the context of evidence potentially supporting
Cross-species/population/study a priori in the protocol; deviations are taken a conclusion of no health effect, a low or mod-
consistency. Three types of consistency in the with care, justified, and documented. When erate level of evidence results in a conclusion
body of evidence can increase confidence in outcomes are sufficiently biologically related of inadequate evidence to reach a conclusion.
the results: across animal studies (consistent that they may inform confidence on the over- Although the conclusions describe asso-
results reported in multiple experimental all health outcome, confidence conclusions ciations, a causal relationship is implied and
animal models or species); across dissimi- may be developed in two steps. Each outcome the ratings describe the level of evidence for
lar populations [consistent results reported would first be considered separately. Then, health effects in terms of confidence in the
across populations (human or wildlife) that the related outcomes would be considered association or the estimate of effect deter-
differ in factors such as time, location, and/or together and reevaluated for properties that mined from the body of evidence. Table 3
exposure]; and across study types (consistent relate to downgrading and upgrading the body outlines how the Bradford Hill considerations
results reported from studies with different of evidence. on causality (Hill 1965) are related to the
design features). process of evaluating the confidence in the
Other. Additional factors specific to the Step 6: Translate the Confidence body of evidence and then integrating the
topic being evaluated (e.g., particularly rare Ratings into Level of Evidence for evidence (similar to GRADE approach as
outcomes) may result in increasing a confi- Health Effect described by Schünemann et al. 2011).
dence rating. These other factors would be The level of evidence is assessed separately
specified and defined in the protocol. within the human, experimental animal, Step 7: Integrate the Evidence to
Combine confidence conclusions for and—to the extent possible and necessary— Develop Hazard Identification
all study types and multiple outcomes. other relevant data sets. The conclusions for Conclusions
Conclusions are based on the evidence with the level of evidence for health effects reflect The highest level of evidence for a health
the highest confidence when considering evi- the overall confidence in the association effect from each of the evidence streams is
dence across study types and multiple out- between exposure to the substance and the combined in the final step of the evidence
comes. Confidence ratings are initially set outcome (effect or no effect; Figure 1, step 6). assessment process to determine the hazard
identification conclusion. Hazard identifica-
Table 3. Aspects of the Hill considerations on causality within the OHAT Approach. tion conclusions may be reached on individual
Hill consideration Relationship to the OHAT Approach outcomes (health effects) or groups of biologi-
Strength Considered in upgrading the confidence rating for the body of evidence for large cally related outcomes, as appropriate, based
magnitude of effect and downgrading the confidence rating for imprecision. on the evaluation’s objectives and the avail-
Consistency Considered in upgrading the confidence rating for the body of evidence for consistency able data. The rationale for such conclusions
across study types, across dissimilar populations, or across animal species; in is documented as the evidence is combined
integrating the body of evidence among human, animal, and other relevant data; within and across evidence streams, and the
and in downgrading the confidence rating for the body of evidence for unexplained
inconsistency. conclusions are clearly stated as to which out-
Temporality Considered in initial confidence ratings by key features of study design; for example comes are incorporated into each conclusion.
experimental studies have an initial rating of “high confidence” because of the The five hazard identification conclusion
increased confidence that the controlled exposure preceded outcome. categories are:
Biological gradient Considered in upgrading the confidence rating for the body of evidence for evidence of a • Known to be a hazard to humans
dose–response relationship. • Presumed to be a hazard to humans
Biological plausibility Considered in examining nonmonotonic dose–response relationships and developing
confidence rating conclusions across biologically related outcomes, particularly
• Suspected to be a hazard to humans
outcomes along a pathway to disease; considered in downgrading the confidence rating • Not classifiable as a hazard to humans
for the body of evidence for indirectness. Other relevant data that inform plausibility, • Not identified to be a hazard to humans.
such as physiologically based pharmacokinetic and mechanistic studies, are considered In step 7, the evidence streams for human
in integrating the body of evidence. studies and nonhuman animal studies, which
Experimental evidence Considered in setting initial confidence ratings by key features of study design and in have remained separate through the previous
downgrading the confidence rating for risk of bias.
steps, are integrated along with other relevant
data. Hazard identification conclusions are to the conclusions. Draft OHAT evaluations communication and clarity about how hazard
reached by integrating the highest level-of- undergo peer review and public comment as identification conclusions are reached by docu
evidence conclusion for a health effect(s) from part of the overall process for finalization and menting the source of the data considered,
the human and the animal evidence streams. publication (NTP 2013g). the methods of quality assessment used,
On an outcome basis, this approach applies and the scientific judgments made during
to whether the data support a health effect Discussion evidence integration.
conclusion or evidence of no health effect. Aspects of systematic-review methodology
When the data support a health effect, designed to increase objectivity and transpar- References
the level-of-evidence conclusion for human ency may add to the time and investment
data from step 6 (“high,” “moderate,” or required to develop literature-based evalua Ahmed I, Sutton AJ, Riley RD. 2012. Assessment of publication
bias, selection bias, and unavailable data in meta-analyses
“low”) is considered together with the level tions, and the NTP is mindful of these con- using individual participant data: a database survey. BMJ
of evidence for nonhuman animal data to cerns. In applying the OHAT Approach to 344:d7762; doi:http://dx.doi.org/10.1136/bmj.d7762.
reach one of four hazard identification con- case studies (NTP 2013a, 2013f), the NTP AHRQ (Agency for Healthcare Research and Quality). 2013.
AHRQ Training Modules for the Systematic Reviews
clusions (Figure 1, step 7). If one evidence found that steps 2–4 were the most time Methods Guide. Available: http://www.effectivehealthcare.
stream (either human or animal) has no intensive: selecting studies, extracting data, ahrq.gov/index.cfm/tools-and-resources/slide-
studies, then conclusions are based on the and assessing the quality of individual studies. library/#slidetrainingmodules [accessed 11 October 2013].
Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R,
remaining evidence stream alone (which is Although not formally part of the systematic- Brozek J, et al. 2011. GRADE guidelines: 3. Rating the
equivalent to treating the missing evidence review process, data management resources quality of evidence. J Clin Epidemiol 64:401–406.
stream as “low”). were used to increase transparency and effi- Birnbaum LS, Thayer KA, Bucher JR, Wolfe MS. 2013.
Implementing systematic review at the National Toxicology
Any impact of other relevant data on the ciency in developing the case studies so that Program [Editorial]. Environ Health Perspect 121:A108–
hazard identification conclusion derived by time invested in the early steps was recouped A109; doi:10.1289/ehp.1306711.
integrating the human and nonhuman animal in later steps by entering study information Bucher JR, Thayer K, Birnbaum LS. 2011. The Office of Health
Assessment and Translation: a problem-solving resource
streams is considered next (Figure 1, step 7). into a database. Summary tables and graph- for the National Toxicology Program [Editorial]. Environ
Other relevant data could include, but are not ics were readily made from the database to Health Perspect 119:A196–A197; doi:10.1289/ehp.1103645.
limited to, mechanistic data, in vitro data, or facilitate decision making in steps 6 and 7 CLARITY Group at McMaster University. 2013. Systematic
data based on upstream indicators of a health when evaluating confidence in a body of stud- and Literature Review Resources. Instruments. Available:
http://www.evidencepartners.com/resources/ [accessed
effect. Mechanistic data or another type of ies and integrating evidence streams to develop 15 January 2013].
other relevant data is not required to reach a conclusions. The value of these efficiencies and Cochrane Collaboration. 2013. Glosssary of Cochrane terms.
final hazard identification conclusion. further development of these web-based sys- Available: http://www.cochrane.org/glossary [accessed
15 January 2013].
• If other relevant data provide strong sup- tems for data display, data management, and EFSA (European Food Safety Authority). 2010. Application of
port for biological plausibility of the rela- data sharing cannot be understated. systematic review methodology to food and feed safety
tionship between exposure and the health assessments to support decision making. EFSA J 8(6):1637
effect, the hazard identification conclusion Conclusions doi:10.2903/j.efsa.2010.1637.
Food and Drug Administration. 2010. Guidance for Industry:
may be upgraded (Figure 1, step 7, black Applying systematic-review methodologies Adaptive Design Clinical Trials for Drugs and Biologics:
“up” arrows) from that initially derived by to environmental health questions is gain- Draft Guidance. Available: http://www.fda.gov/downloads/
Drugs/.../Guidances/ucm201790.pdf [accessed 29 December
considering the human and nonh uman ing a critical mass (EFSA 2010; NRC 2013a, 2012].
animal evidence together. It is envisioned 2013b; Woodruff and Sutton 2011). The Foster PM. 2009. Explanation of Levels of Evidence for
that strong evidence for a relevant biological OHAT Approach provides a practical method Reproductive System Toxicity. Available: http://ntp.niehs.
nih.gov/ntp/htdocs/levels/09_3566_ntp_reprotox_r1.pdf
process from mechanistic or in vitro data for applying the principles of systematic review [accessed 15 January 2013].
could result in a conclusion of “suspected” to address environmental health questions. Germolec D. 2009. Explanation of Levels of Evidence for Immune
in the absence of human epidemiology or Moving forward, OHAT will apply this frame- System Toxicity. Available: http://ntp.niehs.nih.gov/ntp/
experimental animal data. work in future evaluations (NTP 2014b). As htdocs/levels/09-3566%20ntp-itox-r1.pdf [accessed
15 January 2013].
• If other relevant data provide strong opposi- evaluations are completed and practices in the GRADE (Grading of Recommendations Assessment
tion for biological plausibility of the relation field of systematic review evolve, OHAT may Development and Evaluation Working Group). 2014. GRADE
ship between exposure and the health refine and amend its “evergreen” approach Guidelines—Best Practices Using the GRADE Framework
Available: http://www.gradeworkinggroup.org/publications/
effect, the hazard identification conclusion and post updates to the framework (NTP JCE_series.htm [accessed 6 June 2014].
may be downgraded (Figure 1, step 7, gray 2013d). The protocols and the data compiled Guyatt GH, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al.
“down” arrows). as part of an evaluation (e.g., study-level health 2011a. GRADE guidelines: 1. Introduction—GRADE evidence
profiles and summary of findings tables. J Clin Epidemiol
When the data provide evidence of no effects data and risk-of-bias assessment) will be 64:383–394.
health effect, the level-of-evidence conclu- publicly available following its completion to Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P,
sion for human data from step 6 is considered increase transparency and facilitate data sharing Rind D, et al. 2011b. GRADE guidelines 6. Rating the quality
of evidence—imprecision. J Clin Epidemiol 64:1283–1293.
together with the level-of-evidence for health with government agencies, the scientific com- Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J,
effects conclusion for nonhuman animal data. munity, and the public. The scientifically rigor- Helfand M, et al. 2011c. GRADE guidelines: 8. Rating
Again, any impact of other relevant data on the ous and objective procedures that have been the quality of evidence—indirectness. J Clin Epidemiol
64:1303–1310.
hazard identification conclusion is considered. a hallmark of OHAT literature-based health Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J,
• If the human level-of-evidence conclusion assessments will be strengthened by implemen- et al. 2011d. GRADE guidelines: 5. Rating the quality of
of no health effect is supported by animal tation of the OHAT Approach for systematic evidence—publication bias. J Clin Epidemiol 64:1277–1282.
evidence of no health effect, the hazard review and evidence integration (NTP 2013b). Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-
Coello P, et al. 2011e. GRADE guidelines: 4. Rating the
identification conclusion is “not identified.” The application of the procedures of quality of evidence—study limitations (risk of bias). J Clin
The outcome of the evaluation includes systematic review to environmental health Epidemiol 64:407–415.
any hazard identification conclusions questions has the potential to increase objec- Higgins J, Green S. 2011. Cochrane Handbook for Systematic
Reviews of Interventions. Available: http://www.cochrane-
reached or data needs identified, along with tivity and transparency much like it has handbook.org [accessed 3 February 2013].
a detailed rationale outlining how human, already done for clinical medicine. Developing Hill AB. 1965. The environment and disease: association or
animal, and other relevant data contributed evaluations with this approach can improve causation? Proc R Soc Med 58:295–300.
Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Available: http://ntp.niehs.nih.gov/go/9741 [accessed Under Consideration and Evaluations. Available: http://ntp.
2009. Publication bias in clinical trials due to statistical 21 February 2013]. niehs.nih.gov/go/evals [accessed 31 March 2014].
significance or direction of trial results. Cochrane Database NTP (National Toxicology Program). 2012b. Board of Scientific Rhomberg LR, Goodman JE, Bailey LA, Prueitt RL, Beck NB,
Syst Rev; doi:10.1002/14651858.MR000006.pub3. Counselors Meeting, June 21–22, 2012. Meeting Materials. Bevan C, et al. 2013. A survey of frameworks for best
Jahnke GD, Iannucci AR, Scialli AR, Shelby MD. 2005. Center Available: http://ntp.niehs.nih.gov/go/9741 [accessed practices in weight-of-evidence analyses. Crit Rev Toxicol
for the Evaluation of Risks to Human Reproduction—the 16 June 2013]. 43:753–784.
first five years. Birth Defects Res B Dev Reprod Toxicol NTP (National Toxicology Program). 2013a. Draft Protocol Schünemann H, Hill S, Guyatt G, Akl EA, Ahmed F. 2011. The
74:1–8. for Systematic Review to Evaluate the Evidence for GRADE approach and Bradford Hill’s criteria for causation.
Khan K, ter Riet G, Glanville J, Sowden A, Kleijnen J, eds. an Association between Bisphenol A (BPA) and J Epidemiol Community Health 65:392–395.
2001. Undertaking Systematic Reviews of Research on Obesity. Available: http://ntp.niehs.nih.gov/ntp/ohat/ Schünemann HJ, Oxman AD, Vist GE, Higgins JPT, Deeks JJ,
Effectiveness: CRD’s Guidance for those Carrying Out evaluationprocess/bpaprotocoldraft.pdf [accessed 6 June Glasziou P, et al. 2012. Chapter 12: Interpreting results
or Commissioning Reviews. CRD Report 4. 2nd ed. York, 2014]. and drawing conclusions. In: Cochrane Handbook for
UK:NHS Centre for Reviews and Dissemination, University NTP (National Toxicology Program). 2013b. OHAT Implementation Systematic Reviews of Interventions Version 510 [updated
of York. of Systematic Review. Available: http://ntp.niehs.nih.gov/ March 2011] (Higgins JPT, Green S, eds): The Cochrane
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, go/38673 [accessed 16 June 2013]. Collaboration. Available: http://www.cochrane.org/sites/
Ioannidis JP, et al. 2009. The PRISMA statement for report- NTP (National Toxicology Program). 2013c. Webinar on the default/files/Handbook510pdf_Ch12_Interpreting.pdf
ing systematic reviews and meta-analyses of studies Assessment of Data Quality in Animal Studies. Available: [accessed 6 June 2014].
that evaluate health care interventions: explanation and http://ntp.niehs.nih.gov/?objectid=27C7DBCB-C462-2B16- Shelby MD. 2005. National Toxicology Program Center for the
elaboration. J Clin Epidemiol 62:e1–e34. FFC315A4AE022937 [accessed 6 June 2014]. Evaluation of Risks to Human Reproduction: guidelines for
Lohr KN. 2012. Grading the Strenth of Evidence. In: The NTP (National Toxicology Program). 2013d. Draft OHAT CERHR expert panel members. Birth Defects Res B Dev
EHC Program Slide Library. Available: http://www. Approach for Systematic Review and Evidence Integration Reprod Toxicol 74:9–16.
effectivehealthcare.ahrq.gov/index.cfm/slides/?pageActio for Literature-Based Health Assessments–February Stern JM, Simes RJ. 1997. Publication bias: evidence of
n=displaySlides&tk=18 [accessed 13 July 2012]. 2013. Available: http://ntp.niehs.nih.gov/ntp/ohat/ delayed publication in a cohort study of clinical research
Medlin J. 2003. New arrival: CERHR monograph series on repro- evaluationprocess/draftohatapproach_february2013.pdf projects. BMJ 315:640–645.
ductive toxicants. Environ Health Perspect 111:A696–A698. [accessed 6 June 2014]. Twombly R. 1998. New NTP centers meet the need to know.
Moher D, Liberati A, Tetzlaff J, Altman DG. 2009. Preferred NTP (National Toxicology Program). 2013e. Board of Scientific Environ Health Perspect 106:A480–A483.
reporting items for systematic reviews and meta-analyses: Counselors Meeting, June 25, 2013. Meeting Materials. Viswanathan M, Ansari M, Berkman ND, Chang S, Hartling L,
The PRISMA statement. J Clin Epidemiol 62:1006–1012. Available: http://ntp.niehs.nih.gov/go/9741 [accessed McPheeters LM, et al. 2012. Assessing the risk of bias of
NRC (National Research Council). 2011. Committee to Review 1 November 2013]. individual studies when comparing medical interventions.
of the Environmental Protection Agency’s Draft IRIS NTP (National Toxicology Program). 2013f. Draft Protocol for Agency for Healthcare Research and Quality Methods
Assessment of Formaldehyde. Washington, DC:National Systematic Review to Evaluate the Evidence for an Guide for Comparative Effectiveness Reviews. AHRQ
Academies Press. Available: http://www.nap.edu/openbook. Association Between Perfluorooctanoic Acid (PFOA) Publication No. 12-EHC047-EF. Available: http://www.
php?record_id=13142 [accessed 30 July 2012]. or Perfluorooctane Sulfonate (PFOS) Exposure and effectivehealthcare.ahrq.gov/index.cfm/search-for-
NRC (National Research Council). 2013a. Critical Aspects of Immunotoxicity. Available: http://ntp.niehs.nih.gov/ntp/ohat/ guides-reviews-and-reports/?pageaction=displayproduct
EPA’s IRIS Assessment of Inorganic Arsenic: Interim Report. evaluationprocess/pfos_pfoa_immuneprotocoldraft.pdf &productid=998 [accessed 3 January 2013].
Washington, DC:National Academies Press. Available: http:// [accessed 6 June 2014]. Whitlock EP, Lopez SA, Chang S, Helfand M, Eder M, Floyd N.
www.nap.edu/catalog.php?record_id=18594 [accessed NTP (National Toxicology Program). 2013g. OHAT Evaluation 2010. AHRQ series paper 3: identifying, selecting, and
7 November 2013]. Process. Available: http://ntp.niehs.nih.gov/go/38138 refining topics for comparative effectiveness systematic
NRC (National Research Council). 2013b. Project Title: [accessed 16 June 2013]. reviews: AHRQ and the Effective Health-Care Program.
Review of the IRIS Process. Available: http://www8. NTP (National Toxicology Program). 2014a. Chemical Effects in J Clin Epidemiol 63:491–501.
nationalacademies.org/cp/projectview.aspx?key=49458 Biological Systems (CEBS). Available: http://www.niehs. Woodruff TJ, Sutton P. 2011. An evidence-based medicine
[accessed 1 November 2013]. nih.gov/research/resources/databases/cebs/index.cfm methodology to bridge the gap between clinical and
NTP (National Toxicology Program). 2012a. Board of Scientific [accessed 31 March 2014]. environmental health sciences. Health Aff 30:931–937.
Counselors Meeting, December 11, 2012. Meeting Materials. NTP (National Toxicology Program). 2014b. OHAT Nominations