Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/51783252
CITATIONS READS
3 29
7 AUTHORS, INCLUDING:
Abstract—We present a fully automated framework for scoring have been used in epidemiological studies for identifying
a patient’s risk of cardiovascular disease (CVD) and mortality CVD risk patients [21]. There are several reasons, why X-ray
from a standard lateral radiograph of the lumbar aorta. The imaging is beneficial for predicting cardiovascular mortality.
framework segments abdominal aortic calcifications for com-
puting a CVD risk score and performs a survival analysis to First, a lot of historical radiographs are available from large,
validate the score. Since the aorta is invisible on X-ray images, long lasting osteoporosis screening programs. Second, X-ray
its position is reasoned from (1) the shape and location of the machines are inexpensive and widespread, which is ideal for
lumbar vertebrae and (2) the location, shape, and orientation conducting new, independent epidemiological studies. And
of potential calcifications. The proposed framework follows the finally, the acquisition of X-ray images is fast and exposes
principle of Bayesian inference, which has several advantages
in the complex task of segmenting aortic calcifications. Bayesian the patient to less radiation than CT.
modeling allows us to compute CVD risk scores conditioned on However, it is not easy to delineate aortic calcifications in
the seen calcifications by formulating distributions, dependencies, lumbar standard radiographs. The images commonly have a
and constraints on the unknown parameters. We evaluate the low signal-to-noise ratio, and the radiographic projection is
framework on two datasets consisting of 351 and 462 standard affected by the anatomy of the patient. In addition, one has to
lumbar radiographs, respectively. Promising results indicate that
the framework has potential applications in diagnosis, treatment account for different scanner settings, radiographer techniques,
planning, and the study of drug effects related to CVD. patient positioning, and pathologies. Calcified plaques share
the same local appearance as the spine, bowel content, the
Index Terms—Cardiovascular disease, risk scoring, radio-
graphs, Bayesian, segmentation, automated, calcifications, spine, image rim, or artifacts in the radiograph, which explains,
vertebrae, aorta, SMC sampler on shapes why even trained radiologists struggle in segmenting aortic
Cardiovascular disease (CVD) is the most common cause calcifications accurately and reproducibly.
of death in the western world. In 2007, it accounted for Since the manual computation of CVD risk scores is not
48% of all deaths in Europe [1], respectively 33.6% in the only subjective, but also laborious and expensive, the process
United States [31]. According to pathological studies, the risk needs to be automated. The automatic segmentation of aortic
for cardiovascular death positively and significantly correlates calcifications poses many challenges, though. To assess the
with the evidence of atherosclerosis in the aorta and coronary complexity of the problem, a random forest classifier [4]
arteries [22]. Active and inflamed atherosclerotic plaques are was trained on calcifications labels and applied to a standard
assumed to build up in the intimal walls of the arteries – radiograph. As illustrated in Fig. 1, the segmentation contains
often unrecognized over many years – until they suddenly lots of false positives, suggesting that context information is
provoke a heart attack or cardiac death. We therefore analyze needed to disambiguate the image content.
atherosclerotic plaques to develop effective indicators of CVD. De Bruijne [5] proposed a technique that finds the aorta as
One important CVD risk factor are aortic calcifications in a region of interest before segmenting the aortic calcifications.
the abdomen [33]. They correlate with coronary calcium levels The main idea is to infer the aorta indirectly from the pose and
and are independent of the usual risk factors [34]. Further, they shape of the spine, and the relative location of potential aortic
have been linked to parental occurrence of premature CVD calcifications. By modeling these dependencies, the scheme is
[23] and vascular disease in patients with type II diabetes [26]. able to segment the aorta, even though the aorta boundary is
The severity of abdominal aortic calcifications is primar- not visible on standard radiographs.
ily assessed with computerized tomography (CT) and X-ray The approach was demonstrated with manually given land-
imaging. Although CT images are better suited for estimating marks at the corners and midpoints of the first four lumbar
the degree of abdominal calcifications, standard X-ray images vertebrae. We can fully automate the scheme by generating the
vertebrae landmarks from an automated spine segmentation,
Kersten Petersen, Melanie Ganz, Peter Mysling, Lene Lillemark, and such as [10], [25], [30]. However, our experiments suggest that
Alessandro Crimi are with the University of Copenhagen, DK-2100 Copen-
hagen OE, Denmark. the typical error of these automated vertebrae segmentation
Mads Nielsen is with the University of Copenhagen, DK-2100 Copenhagen methods suffices to mislead the aorta segmentation technique
OE, Denmark, and also with Nordic Bioscience Imaging and Synarc Imaging of [5]. In several cases, this method places the aorta on top of
Technologies, Herlev, Denmark.
Sami S. Brandt is with Nordic Bioscience Imaging and Synarc Imaging the vertebrae boundary or other false positive structures which
Technologies, Herlev, Denmark. locally appear as calcifications [25].
2
Fig. 1. (a) Lateral standard radiograph of the lumbar aorta. (b) Manual annotations of the vertebrae L1-L4, the aorta, and aortic calcifications. (c) Output
from a random forest classifier based on a multiscale local jet representation of the calcifications and the background.
In this paper, we present a fully automated framework that segment-based appearance, shape, and orientation features of
models all the steps from lumbar standard radiographs to the calcifications for improving the segmentation of the aorta.
CVD survival analysis. The framework reasons the aorta from The main contributions of this work, however, emerge on
the segmented vertebrae and calcifications, but was shown a larger scale. 1) To the best of our knowledge, we present
[25] to segment both the vertebrae and the aorta much more the first feasible system for automatically computing the AC24
accurately and reliably than [5]. Instead of concatenating two severity score [21] from a standard radiograph. 2) We provide a
segmentation techniques, we propose a unified model-based complete Bayesian formulation of a challenging segmentation
system that is guided by the principles of Bayesian inference. problem where the correct solution is highly uncertain due
The Bayesian approach [27] uses the clear semantics of to a low signal-to-noise ratio, clutter, or occlusions. Our
probability theory to derive the best estimate, starting from a framework naturally incorporates widely applicable ideas from
requested set of properties. We formalize these properties by previous work, e.g., the static SMC sampler on shapes [25],
probability distributions and dependency assumptions on the conditional point distribution models [24], or a spatial prior
parameters of our system. For instance, we explicitly express for archipelago-like structures [15].
our beliefs about likely vertebrae and aorta shapes, before The remainder of this paper is organized as follows. In
observing the image data. Similarly, we encode our prior Section I, we define the Bayesian framework and explain its
knowledge about the expected calcium distribution within the implementation. Section II shows experimental results. The
aorta. We state, which parameters are assumed to be inde- discussion of the results and the properties of the framework
pendent of each other, and model conditional dependencies are given in Section III.
and constraints of shape parameters [5], [24]. In Bayesian
theory, the observed image data is used in the likelihood I. M ETHOD
function for extracting additional information on the unknown
parameters. The combination of the prior model and the A. The Bayesian Framework
likelihood function leads to the posterior distribution, which
Let D = (DV , DC ) be the observed vertebrae and calcifica-
captures the uncertainty in the unknown parameters, after
tion data, and may θA ∈ ΘA parameterize the unknown aorta.
observing the image data. It represents the complete solution
Assume further that u ∈ YC N denotes a latent variable vector
to our problem and allows us to make predictions about likely
for the N pixel locations, and that label space YC annotates
parameter settings and the associated scores, e.g., the CVD
calcifications vs. background. Then the conditional expectation
risk score. The Bayesian paradigm thus provides a consistent
of any score function f (u, θA ) is
statistical interpretation of our domain knowledge and the
Z
observed data. It should be seen in contrast to classical model- X
E {f |D} = f (u, θA )p(u, θA |D) dθA , (1)
based systems which propagate regularized point estimates. ΘA
u∈YC N
The implementation of the framework contains several P P P
refinements and novel ideas. The vertebrae segmentation, where u∈YC N ≡ u1 ∈YC · · · uN ∈YC .
for instance, relies on conditional shape models for steering Assuming a Bayesian model, the joint posterior distribution
the search of likely vertebrae shapes. Another example are of the aorta parameters and calcification labelings given the
3
Vertebrae Probability Map Aorta Probability Map Calc. Location Prior Compute Expected Score
p(v|θV ), Sec. I-C p(u|θA ), Sec. I-E plocation (u|θA ), Sec. I-F E {f |D}, Sec. I-A
Fig. 2. A model-based view of the Bayesian framework. The dashed boxes denote posterior distributions that depend on all the models they encompass. This
includes appearance models for representing likelihoods, and models describing prior knowledge about the shapes and spatial distribution of the vertebrae,
the aorta, and aortic calcifications. The score is computed for samples from the joint posterior distribution of aorta shape parameters θA and calcification
labelings u given the observed data D.
data, p(u, θA |D), is the complete solution of our segmentation that the aorta appearance model, p(DA |u, θA ), and the aorta
problem. It factorizes into probability map, p(u|, θA ) are independent of DV (see Section
I-E). We obtain DA by denoising and classifying DC , which
p(u, θA |D) = p(u|θA , DC )p(θA |D) . (2)
shall simulate the behavior of the more costly calcification
We model the likelihood, p(u|θA , DC ), by a latent variable spatial prior.
model [18] of calcifications, and assume independence of The aorta parameters θA are related to the vertebrae pa-
the vertebrae data, DV . Applying Bayes’ theorem, the latent rameters θV ∈ ΘV , which serve as nuisance parameters in our
variable model model. Thus, the posterior distribution of the aorta parameters,
given the vertebrae data, in (5) can be written as
p(u|θA , DC ) ∝ p(DC |u, θA )p(u|θA ) (3) Z
gives rise to a likelihood function, p(DC |u, θA ), and a prior, p(θA |DV ) = p(θV , θA |DV ) dθV
p(u|θA ). The likelihood is given by a calcification appearance ZΘV
model, while the prior = p(θA |θV )p(θV |DV ) dθV . (6)
ΘV
p(u|θA ) = plocation (u|θA )pspatial|location (u|θA ) (4)
It is referred to as the aorta shape model. The conditional
is the product of a calcification location prior and a calcifi- model p(θA |θV ) regresses aorta from vertebrae parameters,
cation spatial prior (see Section I-F). and is considered to be independent of DV (see Section I-D).
According to [25], we can rewrite Analogous to (5), the posterior distribution of the vertebrae
parameters, given the vertebrae data, unfolds as
p(θA |D) ≈ p(θA |DV , DA )
X NV X
Y
∝ p(DA |u, θA , DV )p(u, θA |DV ) p(θV |DV ) ∝ p(θV ) p(yn |vn , θV )p(vn |θV ) , (7)
u∈YC NA n=1 vn ∈YV
X
= p(θA |DV ) p(DA |u, θA )p(u|θA )
where DV = {yn }N n=1 represents the vertebrae data and v =
V
u∈YC NA NV
{vn }n=1 the latent variable for the vertebrae labels. We obtain
X NA
Y a vertebrae appearance model, a vertebrae probability map,
= p(θA |DV ) p(xn |un , θA )p(un |θA )
and a vertebrae shape model (see Section I-B and I-I).
u∈YC NA n=1
Fig. 2 provides an overview of our framework. The graph
NA X
Y indicates that the vertebrae, aorta, and calcification posterior
= p(θA |DV ) p(xn |un , θA )p(un |θA ) ,
distribution each consist of an appearance model, and two
n=1 un ∈YC
(5) priors. The shape models and the spatial prior introduce prior
knowledge about the shape of an object, whereas the class
where DA = {xn }N A
n=1 encodes the appearance, shape, and probability maps and the location prior encode the expected
orientation of calcification segments and where we assume segmentation result prior to observing the data.
4
In detail, aorta shapes are aligned with the same pose the same local appearance as other structures in the image,
parameters as the corresponding vertebrae shapes to compare e.g., image artifacts, bowel content, the vertebrae boundary
the correlation between aorta and vertebrae shapes, i.e., or the image rim. Thus, more global information have to be
considered to rule out these false positives.
āA|V = āV (13)
and
AA|V = AV . (14)
As soon as the aorta and vertebrae shapes have been trans-
formed by GPA, the shape parameters of the aorta are re-
gressed from the vertebrae using a conditional normal distri-
bution. The mean conditioned on the vertebrae is given by
b̄A|V = b̄A + BAV B−1
V bV − b̄V , (15)
and the covariance matrix is computed as
BA|V = BA − BAV B−1
V BVA , (16)
where the covariance matrices are obtained from the joint
covariance matrix of bA and bV
!
BA BAV
B= . (17)
BVA BV
The conditional covariance matrix BA|V is the Schur com-
plement of BV in B, i.e., the inverse of the inverted joint (a) (b)
covariance matrix, where the vertebrae part has been dropped. Fig. 5. (a) The output from a statistical pixel classifier in a region of interest
By assuming a Gaussian model, with the mean and covari- for the aorta. (b) The denoised output, where vertebrae, the image rim, and
ances above, the probability of an aorta shape θA conditioned small calcifications have been removed for the sake of robustness.
on a vertebrae shape θV is given by
The calcification probabilities are denoised as follows.
p(θA |θV ) = N θ̄A|V , ΣA|V , (18) Pixels that lie on the image rim or appear in at least half
of the sampled vertebrae region are set to c. Morphological
where !
āA|V filters (opening, followed by closing) clean segments that are
θ̄A|V = (19) smaller than the smallest manually annotated calcification in
b̄A|V the training data. Weakly connected components are separated
and ! by morphological opening. The probabilities of all modified
AA|V + Γa 0 pixel classifier outputs are set to c. Fig. 5(b) shows that only
ΣA|V = . (20) segments remain that are larger than a detectable calcification.
0 BA|V
Segments that have the size of big calcifications are ana-
Thus, the aorta shape model is specified by substituting (18) lyzed with respect to their appearance, shape, and orientation.
and (7) into (6). Smaller segments, for which these features cannot be robustly
evaluated, are assumed equally likely to be calcified or not.
E. Aorta Posterior We apply a segment-based random forest classifier to
evaluate the large segments. The input data characterizes
The posterior distribution of the aorta samples in (5) factor-
the appearance and shape of the segments, as well as their
izes similar to the vertebrae posterior distribution. The aorta
orientation with respect to the aorta. More specifically, we
shape model, p(θA |θV ), has been defined in (6); the aorta
use the mean and median of the segment intensities, the
probability map, p(u|θA ), describes the spatial distribution
area, convex area, diameter, perimeter, length, width, and
of aorta-related structures; and the aorta appearance model,
the angle between the major principal component and the
p(DA |u, θA ), represents the likelihood. Let us treat the latter
nearest aorta wall. The training labels are gained from the
two models in more depth.
manual calcification annotations. The probabilistic output of
The aorta appearance model is composed from the denoised
the segment classifier (see Fig. 4(b) and Fig. 4(d)) combined
output of a random forest classifier for calcifications, i.e.,
( with the fixed probabilities for the small segments constitute
c noise at position n the likelihood p(DA |u, θA ).
p(xn |un , θA ) = , (21) The aorta probability map is constructed from the manual
p(xn |un , θA ) otherwise
annotations of calcifications to describe the spatial distribution
where c > 0 is a small constant. The denoising step is of a calcification and the background class. We compute
necessary, because the pixel classifier alone cannot reliably the expected probability distribution of calcium given the
segment calcifications. The problem is that calcifications share mean aorta shape by following the approach in [5]. A mean
6
Fig. 4. (a) Comparison of a misplaced aorta sample (dashed) and the manually annotated aorta (solid). (b) Segment probabilities of the misplaced aorta
sample. (c) Comparison of a well fitting aorta sample and the manually annotated aorta. (d) Segment probabilities of the well fitting aorta sample. (e) The
aorta probability map for the calcification class.
label vector is created by averaging the calcification labels combination of patch prototypes. The goal is to minimize
of an aligned training set (see Fig. 4(e)). The regularized
E = ||X − DA||2fro , (22)
version of this mean label vector defines the distribution of the 2
M ×K
calcified class, while the complementary probabilities estimate where D ∈ {0, 1} is a dictionary of K binary patches,
the distribution of the background class. and A ∈ {0, 1}K×S is a sparse representation of X with
respect to D, such that each column of A contains one
F. Calcification Posterior non-zero entry. Instead of finding the global minimizer of
(22), the solution is approximated by first learning D with
The latent variable model, p(u|θA , DC ), in (3) models the K-means clustering for a set of training patches estimated
posterior distribution of the calcification labels and it factorizes from manual annotations of the calcifications. After computing
into a likelihood and a prior model. D, the estimate for A is determined by selecting the closest
The likelihood, p(DC |u, θA ), is called the calcification prototype for each training patch.
appearance model and relies on the pixel classifier of the To account for correlations among neighboring patches,
aorta appearance model. False positives are suppressed, as their joint distribution is modeled by a first-order Markov
described in Section I-E, but in contrast to the aorta appearance Mesh Random Field (MMRF) [17]. Using the chain rule of
model no segment classifier is applied to evaluate the segment probability and the Markov assumption, we obtain
probabilities. Instead, the calcification spatial prior, arising
from (4), models the spatial composition of the calcifications. pspatial|location (u|θA ) = p(u1 )p(u2 |u1 ) . . . p(uT |u1 , . . . uT −1 )
T
Y
∼
= p(ut |N(u)) , (23)
t=1
where N(u) comprises all the direct neighbors of patch u
that have previously been processed, and T is the number of
patches in the image. The conditional distributions in (23) are
estimated from the histogram of patch neighbor combinations
in the training data. The parameters, M and K, are selected
(a) (b) (c) via the minimum description length (MDL) principle.
The calcification location prior, plocation (u|θA ), is built in
Fig. 6. (a) Manual annotation of calcifications. (b) Output probabilities of the same way as the probability map for the calcification label
calcification pixel classifier. (c) Conditional mean of the calcification posterior
for samples from the calcification location prior. in the aorta probability map (see Section I-E). It encodes
our prior knowledge about the aorta locations, at which
The calcification spatial prior, pspatial|location (u|θA ), is con- calcifications are likely to appear.
structed from a binary patch dictionary using a patch grammar
based on Markov mesh random fields. [15], as summarized G. Sampling of the Posterior
below. The effect of the prior is illustrated in Fig. 6. The Bayesian framework contains four intractable distri-
In the dictionary training phase, S binary patches of size butions, which we approximate with Monte Carlo simula-
2
M × M , X ∈ {0, 1}S×M , are approximated by a linear tion [12]. The essence is to generate a large collection of
7
Fig. 7. (a) Initial vertebrae samples. (b) Output samples from vertebrae SMC sampler (black) and manually annotated vertebrae (white dashed). (c) Initial
aorta samples. (d) Output samples from aorta SMC sampler.
Vertebrae Aorta
Shape Model Shape Model
θV θA θA
Calcification
Location Prior
f
D
Vertebrae Aorta
Posterior Posterior
u
Calcification
LV LA Spatial Prior
MC LC
Calcification
Pixel Classifier
DV DA DC
| {z } | {z } | {z }
Vertebrae Aorta Calcifications
Fig. 8. Representation of the Bayesian framework as a probabilistic graphical model using plate notation [7] with factor graphs. Nodes with a single circle
refer to probabilistic variables that are either shaded, if they are observable, or empty, if they are latent. Nodes with double circles are deterministic functions
of their inputs, e.g., the score function f . Factor nodes describe models, while small black circles indicate model parameters. Boxes that enclose variables
are called plates. They group variables into a sub-graph which is replicated as often as it is specified in the bottom right corner. Edges that cross a plate
boundary are repeated once, whereas touching edges are replicated as often as the subgraph. We use them to point out the Monte Carlo estimates of the
posterior distributions in the vertebrae, the aorta, and the calcification stage.
Similarly, the posterior distribution of the aorta, are defined relative to the mean shape and warped to each
p(θA |DV , DA ), in (5) is approximated by the aorta SMC shape sample using thin plate splines [25], so as to statistically
sampler. It uses the Monte Carlo estimate for the posterior of compare shapes of different sizes. The term p(θ) is interpreted
the vertebrae, p(θV |DV ), and generates a weighted ensemble as the probability of θ under a shape model prior to seeing the
of aorta samples {(θA , wA )}L A
l=1 , such that data; the class probability map p(un |θ) is a second prior that
encodes the expected spatial distribution of contextual struc-
p(θA |DV , DA ) ≈ p̂(θA |DV , DA ) tures; and the likelihood, p(xn |un , θ), describes a statistical
LA
X (l)
(l)
pixel classifier that models non-linear appearance information
= wA δ θA − θA . (27) at the n-th location.
l=1
The static SMC sampler is similar to particle filtering,
Finally, evaluating the expected quantities in (1), i.e., but with an important difference. A sequence of artificial
−1
target distributions {πt }Tt=1 is introduced, so as to smoothly
E {f |D}
Z guide the tractable proposal distribution π1 = p(θ) to the
X
= f (u, θA )p(u, θA |D) dθA target distribution πT = π(θ|D). In other words, the static
ΘA
u∈YC N
SMC sampler gradually propagates samples to regions of
(2)
Z X high density instead of sampling from the complicated target
= f (u, θA )p(u|θA , DC )p(θA |D) dθA distribution π(θ|D) directly.
ΘA N
u∈YC
The artificial target distributions are defined by the geomet-
Z X ric path between the proposal and the target distribution, or
= p(θA |D) f (u, θA )p(u|θA , DC ) dθA ,
ΘA
u∈YC N πt (θ) = π(θ|D)βt p(θ)1−βt , (31)
(28)
where the annealing variables 0 ≤ β1 < . . . < βT = 1
leads to two nested Monte Carlo simulations. By using (5), determine the complexity of the t-th target distribution. The
LC aorta samples are drawn from (27), after which MC annealing parameters are either fixed in a cooling schedule,
(l)
latent variable samples u(m,l) ∼ p(u|θA , DC ) are generated or, as for our application, iteratively optimized to achieve a
conditioned on each aorta sample. This yields pre-defined number of effective samples.
LC X
MC The static SMC sampler is a sequential importance re-
1 X (l) sampling scheme, where each artificial target distribution
E {f |D} ≈ f (u(m,l) , θA ) . (29) (l) (l)
LC MC m=1 πt (θ) is approximated by L weighted samples, {θt , wt }L l=1 .
l=1
For each iteration, the algorithm passes through a sampling,
Fig. 8 illustrates the conditional dependencies between
weighting, and resampling phase. First, a collection of sam-
observed and latent variables, and also shows the interplay (l)
ples, {θt }L l=1 , is drawn, either from p(θ) (when t = 1), or
of Monte Carlo estimates, models, and key parameters.
from the output samples of the previous iteration (when t > 1).
Second, the samples are weighted according to the relative
H. Static SMC Sampler on Shapes posterior densities. Assuming that two subsequent artificial
The static SMC sampler on shapes [25] is used to sample target distributions are close to each other, i.e., πt ≈ πt+1 ,
vertebrae and aorta shape distributions (see also Section I-I). It the unnormalized importance weights are derived [11] as
usually performs better than MCMC methods, which can eas- πt (θt−1 )
ily get trapped in local modes and allows to efficiently explore wt ∝ wt−1 . (32)
πt−1 (θt−1 )
high-dimensional static spaces, while providing asymptotically
consistent estimates of the target distribution. The method is Finally, the samples are resampled according to the normalized
suited for segmentation problems, where the correct solution weights. This means that samples from high-density regions
is highly uncertain due to, for instance, a low signal-to-noise generate perturbed copies of themselves, whereas unlikely
ratio, clutter, or occlusions. It is able to provide multiple samples die out.
hypotheses and can incorporate prior knowledge about an The conditional probability p(θt+1 |θt ) from sample θt to a
object’s shape or contextual structures. modified version θt+1 is called the forward Markov kernel of
The goal is to draw samples from the posterior distribution the static SMC sampler. For t = 1, a large number of samples,
of shapes θ given the observed data D, i.e. (see 5), {θ(l) }L
l=1 with Ldense ≫ L, is drawn from p(θ) to construct
dense
Fig. 9. (a) A conditional model from a source sample (solid) to L1 and B. Model Evaluation
L4 (dashed). (b) A conditional model for generating a shifted L1 vertebrae. We perform 5-fold cross-validation to estimate the general-
Vertebrae L1 to L3 from the source sample are assumed to be L2 to L4 of
the new sample. ization ability of the Bayesian framework. In each round of
cross-validation, the available training data is split to prevent
The conditional PDMs may not only predict vertebrae on overfitting of the individual models. However, as the data is
the same vertebral level as the source sample, but also on limited, the data cannot be disjointly distributed among the
adjacent vertebral levels. In Fig. 9(b), L1 of the new sample models. Instead, we allow models to be trained on the same
is inferred from L2-L4, which corresponds to L1-L3 in the data if they are at most weakly dependent on each other. Fig.
source sample. This may be beneficial when the mode of the 10 shows our proposed splitting of the data for one fold.
target distribution is peaked, and none of the initial vertebrae The calcification annotations of the entire training set are
samples fits well enough to survive the filtering process. It used for training the aorta probability map, the calcification
allows to create samples based on sub-parts of surviving, but location prior, and the calcification spatial prior. We assume
shifted vertebrae samples. that these models have only indirect dependencies among each
The local PDM and the conditional PDMs are randomly other and the remaining models of the framework.
chosen for resampling. Most of the initial samples stem from However, 1/5 of the available training data is reserved for
the local PDM, whereas resampled shapes are drawn with creating the segment classifier within the aorta appearance
increasing probability from the conditional PDM. model. When learning informative appearance, shape, and
orientation features for the segments, it is important to be
independent of the proposed aorta and vertebrae shapes, as
II. E XPERIMENTS
well as the statistical pixel classifiers which are involved in
A. Data Collection constructing the segments.
The Bayesian framework is evaluated on lateral standard The remaining 4/5 of the training data are used for training
radiographs of the lumbar aorta and vertebrae. The radio- the residual models. The vertebrae shape model and the
graphs were acquired in 1992 (baseline) and 2001 (follow- aorta shape model rely on different annotations, while the
up) within a combined osteoporosis-atherosclerosis screening statistical pixel classifiers for the vertebrae, the aorta, and
10
TABLE I
the calcifications use the image data. We assume that the P ROPORTION OF SUCCESSFUL , LEVEL - SHIFTED AND FAILED
dependencies among these models are negligible, and that their VERTEBRAE SEGMENTATIONS
influence on the vertebrae maps is small as well. The vertebrae
Dataset Successful Level-shifted Failures
probability map and the vertebrae labeling map are trained
Baseline 270 (77%) 74 (21%) 7 (2%)
on two equally sized independent datasets, since the statistical
Follow-up 332 (72%) 126 (27%) 5 (1%)
pixel classifier for constructing the probability map should be
trained on a separate dataset.
TABLE II
P OINT- TO - CONTOUR ERROR STATISTICS OF THE AUTOMATED
FRAMEWORK FOR SUCCESSFUL CASES
C. Parameter Settings
Dataset Mean Median 75%-ile 95%-ile Errors > 2 mm
1) Shape Models and Spatial Prior: The shape models for (mm) (mm) (mm) (mm)
both the vertebrae and the aorta explain fv = 0.95 of the shape Baseline 1.22 1.09 1.32 2.22 4.2%
variation, which corresponds to 16 principal components for Follow-up 1.34 1.24 1.49 2.45 6.3%
the vertebrae, respectively 8 for the aorta. Both models are
regularized by a small value Γa = diag(1, 1, 1, 1) × 10−4 .
As the number of aorta landmarks varies, the aorta shape is D. Results
represented by 21 linearly interpolated points from each aorta We evaluate the framework on the baseline (bl) and the
wall. The parameters for the calcification spatial prior are set follow-up (fu) dataset, both fully automated (Auto) and
according to [15]. conditioned on one manually placed spine landmark (Semi).
2) Class Probability Maps: The region for the vertebrae The semi-automated setup is meant to estimate the influence
probability map is obtained by enlarging the mean vertebrae of vertebral shifts, the most common error of the vertebrae
shape by 9 mm in each direction. The safety margin around segmentation. In accordance with previous work, we present
this vertebrae region is 12 mm wide. In this corridor, pixels are the segmentation results for the vertebrae, the aorta, and the
neither assigned to a vertebrae region nor to the background. calcifications only for the non-level-shifted cases. For our
The number of vertebrae region clusters for the automated framework, this restriction is indicated by a superscript plus
labeling map, K, was set to 18 after cross validating all values sign (Auto+ ). The computation of the CVD risk scores and
between 5 and 25. the survival analysis, however, is based on Auto and Semi,
3) Appearance Models: The three statistical pixel classi- the two setups that process the entire dataset.
fiers and the segment classifier are modeled by random forest It takes approximately 50 minutes on a 2.4 GHz processor
classifiers with 200 decision trees and 7 random split variables. with 4 GB RAM to automatically compute the CVD risk score
These values have been determined by cross validation on a for a given radiograph. The total training and testing time of
separate dataset. The appearance features for the pixel clas- all our experiments (≈ 1600 images) amounts to roughly two
sifiers include the original image and the multiscale local jet days on a server with 60 nodes, even though our code is written
up to the third order at three logarithmically increasing scales in MATLAB and not optimized for speed. In comparison, a
(0.18 mm, 0.56 mm, and 1.78 mm). Of the available training trained radiologist requires nearly an hour for delineating the
data, 105 observations are randomly sampled to generate the aortic calcifications of one standard radiograph.
input data of the respective classifier. The training data of the 1) Vertebrae segmentation: On standard radiographs, it is
vertebrae pixel classifier contains equally many samples from often difficult to determine the correct height of the lumbar
the vertebrae classes, while the aorta and calcification pixel vertebrae (see Fig. 11(i)). Even trained radiologists may con-
classifier are trained on 20% calcification and 80% background fuse the vertebral levels if they cannot rely on complimentary
pixels. The classifier output probabilities are thresholded at vertebrae scans. To assess the robustness of our system, we
0.85 for maximizing the calcification area overlap. The value have categorized the vertebrae segmentation results in Table
has been determined on an independent validation set. I, following the definitions of [30]. The possible classes are
4) Sampling Methods: The static SMC samplers for the successful, level-shifted (by one vertebral level), and failure.
vertebrae and the aorta use the same parameter settings as Table II summarizes the point-to-contour error statistics
in [25]. The posteriors are estimated by LV = 300 and between the automatically and manually segmented vertebrae
LA = 100 samples in the final iteration. For the calcification for the successful cases. For each dataset, the table presents the
posterior, we set LC = 20 and MC = 10. We sample from mean, median, 75th and 95th percentiles; and the proportions
all possible conditional models for the vertebrae, where the of point errors larger than 2 mm.
conditioned variables include L2 or L3, the vertebrae with the Table III shows that our technique is competitive with previ-
least uncertainty (see Fig. 7(b)). The ratio of the conditional ous work on vertebrae segmentation for standard radiographs.
models and the local shape model is linearly increased from The mean point-to-contour error of Auto+ compares to the
0 to 0.5 during the sampling process. This has the effect state-of-the art, although our method is trained on only six
that in the initial exploration phase of the sampler, nearly all landmarks per vertebra. The work of [19] suggests that our
the samples are drawn from the local PDM, whereas with method will be more accurate if more vertebrae landmarks are
increasing certainty in the best fit, conditional PDMs are used available. The robustness of our method is at least as good as
to refine the search. [30], considering that Roberts et al. have cropped the images
11
Hazard ratio
This study (Auto+
bl ) X-ray 21% 2% 1.22 4
TABLE IV 2
P OINT- TO - CONTOUR ERROR STATISTICS AND MEAN JACCARD INDEX J¯
FOR AORTA SEGMENTATION 1
0
Authors Mean Median 75%-ile 95%-ile J¯ 0 1 2 3 4 5
Time (years)
(mm) (mm) (mm) (mm)
Semi-automated Techniques Fig. 12. Hazard ratios for the EPIPF baseline study showing the relative CVD
de Bruijne [5] 2.70 N/A N/A N/A 77% death risk for patients in the high-risk group (AC24 score > 3.5) compared
to patients in the low-risk group (AC24 score ≤ 3.5).
Automated Techniques
This study (Auto+
bl ) 2.61 2.33 3.32 5.05 73%
This study (Auto+
fu ) 2.62 2.21 3.18 5.01 73%
As a reference, the mean Jaccard index between two trained
radiologists was 0.51 on the follow-up dataset. Note that the
below L4 to reduce the number of ambiguities. Our technique Jaccard index is a strict error measure that is very sensitive
shifts by one vertebral level up- or downwards, as often as to disagreements on small areas, as it is often the case in the
[30] shifts solely upwards. Moreover, our method fails for baseline dataset (see Fig. 11(g)). It is difficult to compare our
considerably fewer cases, and except for two baseline cases results with previous work, since in [5] only the accuracy and
all these failures are shifts by two vertebral levels. Cohen’s κ coefficient [8] were evaluated. Both of these error
2) Aorta segmentation: In previous work [25], we demon- measures, however, are problematic, since they are dominated
strated that our aorta segmentation method clearly outperforms by the large number of true negatives.
the concatenation of [6] and [5]. On a followup sub-study, 4) Computation of CVD risk score: Although the area
a preliminary version of our system achieved 71% for the overlap between the automatically and manually computed
median Jaccard index [20], median(J), calcifications is relatively small, the results are useful for
quantifying the severity of aortic calcifications using the
|A ∩ B| popular AC24 antero-posterior severity score [21]. The idea
J(A, B) = , (33)
|A ∪ B| of the AC24 score is to assess the location and severity of
whereas the automation using the concatenated approach got calcifications by a composite score between 0 and 24. For
51%. In comparison, the mean Jaccard index, J, ¯ between two this score, the aorta is divided into four segments parallel to
trained radiologists was 80% on 29 available annotations, re- the vertebrae L1-L4, and the severity of anterior and posterior
spectively 87% on 21 annotations among the same radiologist. calcifications in each segment is graded separately on a 0-3
In this work, we compare our automated approach to a semi- scale. The correlation coefficient between the automated and
automated technique [5] (see Table IV). We report the same manual AC24 scows is r = 0.7.
statistics as for the vertebrae segmentation but exclude the 5) Survival analysis: For the survival analysis, we divided
number of errors larger than 2 mm. Instead we present the the CVD patients into two approximately equally sized groups
mean Jaccard index J. ¯ The statistics show that our automated by thresholding the AC24 score at 3.5. For each group, i =
approach performs similar to [5], where we have converted 1, 2, we separately estimated the probability density p(t) and
the Dice score D into the Jaccard index J by computing the probability distribution P (t) = P (T < t) of the survival
D time T by employing a Gaussian kernel density estimator with
J= . (34) automatic bandwidth selection [3]. Using the kernel estimates,
2−D
we computed the hazard functions
3) Calcification segmentation: Fig. 11 illustrates the accu-
racy of the proposed calcification segmentation. It can be seen P (t ≤ T < t + ∆t|T ≥ t)
hi (t) ≡ lim
that the results of the automated framework vary across the ∆t→0 ∆t
(35)
dataset. Table V shows the mean and the standard deviation p(t)
= , i = 1, 2 ,
of the Jaccard index denoted by J¯ and std(J), respectively. 1 − P (t)
12
Fig. 11. (a) The manual annotations from a trained radiologist. (b) The segmentation result of the automated framework shown as the conditional mean of
the final samples. (c) The overlay of Fig. 11(a) and Fig. 11(b). The calcifications are colored as follows: Yellow denotes true positives, red false negatives and
blue false positives. (d)-(e) A good calcification segmentation result where the manually annotated calcification is successfully detected. (f)-(g) A calcification
segmentation result with zero Jaccard index. (h)-(i) A poor calcification segmentation result with a lot of false negatives – independent of the vertebral shift.
Note that we have cropped images (d)-(i) for better visualization. The full images have the same rim artifacts as (a)-(c).
TABLE VI
H AZARD RATIOS FOR THE EPIPF BASELINE STUDY III. D ISCUSSION
1
R 5y We have presented a Bayesian framework for automatically
R(0) R(t)dt
5y 0 scoring the CVD mortality risk based on standard lumbar
Radiologist 6.8 4.5 radiographs. The framework has been validated by measuring
This study (Semibl ) 4.8 2.8
the framework’s ability to distinguish high-risk CVD patients
This study (Autobl ) 4.5 2.4
(AC24 score > 3.5) from others (AC24 score ≤ 3.5). On
the EPIPF baseline images, the framework’s mean hazard
ratio for the first five years was 2.4, which compares to the
and the hazard ratio
discriminative power of a manually computed AC24 score
h2 (t) (mean hazard ratio of 4.5). In other words, a patient with an
R(t) = . (36)
h1 (t) automated AC24 score > 3.5 has according to the mean hazard
We report the instantaneous hazard ratio at t = 0 and ratio over 5 years a 2.4 times higher risk of dying of CVD
the 5-year-mean-hazard-ratio between the two groups for the than a patient with an AC24 score ≤ 3.5. This result suggests
automated, semi-automated, and manual AC24 scoring (Table that the presented framework can be useful for efficient and
VI). Fig. 12 illustrates that the manual scores are better than objective risk assessment of CVD mortality.
the automated one, but also that all scorings manifest an The framework could for example be employed during
increased CVD death risk in the group of higher AC24 scores. osteoporosis (OP) screening, since a high risk for OP relates to
13
a high risk of CVD events [32]. It is known that both diseases techniques will inevitably improve the accuracy and robustness
are highly prevalent in postmenopausal women, who do not of the system, but already the current results demonstrate the
take estrogens, or have other risk factors for OP. Hence, the framework’s potential for better diagnosis and prognosis of
same lateral lumbar radiograph could be used to measure the high-risk CVD patients.
risk for two highly prevalent public diseases.
The accuracy of the automated framework is good for most ACKNOWLEDGMENT
of the cases but failure cases may occur due to, for instance,
The authors would like to thank the Center for Clinical
level shifting in the vertebrae segmentation. The influence of
and Basic Research for providing scans and radiographic
vertebral shifts was quantified by analyzing a semi-automated
readings. We gratefully acknowledge the funding from the
variant of our framework. After manually fixing one vertebral
Danish Research Foundation (Den Danske Forskningsfond)
landmark, the mean hazard-ratio increased by 17% to 2.8.
supporting this work. We also thank the anonymous reviewers
Thus, future work will focus on reducing the number of
for their useful comments.
shifts by devising more discriminative vertebrae features and
explicitly modeling surrounding anatomical structures.
The vertebrae segmentation of the proposed framework R EFERENCES
competes with the state-of-the art. The vertebrae SMC sampler [1] S. Allender, P. Scarborough, V. Peto, and M. Rayner. European
achieved accurate segmentations for most of the compared cardiovascular disease statistics 2008.
[2] Y. Z. Bagger, L. B. Tankó, P. Alexanderson, H. B. Hansen, G. Qin, and
radiographs – even on the baseline scans, which are degraded C. Christiansen. The long-term predictive value of bone mineral density
by a lot of projection artifacts and image noise. measurements for fracture risk is independent of the site of measurement
The proposed automated aorta segmentation is nearly as and the age at diagnosis: results from the prospective epidemiological
risk factors study. In Osteoporosis International, volume 17, pages 471–
accurate as the semi-automated approach in [5]. Most of the 477. Springer, 2006.
segmentation errors can be explained by vertebral level shifting [3] G. J. F. Botev. Zdravko I. and K. D. P. Kernel density estimation via
and limited flexibility of the aorta shape model. Certain rare diffusion. Annals of Statistics, 38(5):2916–2957, 2010.
[4] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
cases are not well represented by the shape model. However, [5] M. D. Bruijne. Shape particle guided tissue classification. In P. Golland
we suspect that this problem could be alleviated by taking the and D. Rueckert, editors, In Proc. Mathematical Methods in Biomedical
non-linearity of the shape manifold into account. Image Analysis (MMBIA 2006), 2006.
[6] M. d. Bruijne and M. Nielsen. Image segmentation by shape particle
The mean Jaccard index between automatically and manu- filtering. In Proc. International Conference on Pattern Recongition (ICPR
ally computed calcifications is not very high, but sufficient for 2004), volume 3, pages 722–725, Washington, DC, USA, 2004. IEEE
computing an AC24 score [21]. In particular on the baseline Computer Society.
[7] W. L. Buntine. Operations for learning with graphical models. Journal
dataset, where no or only few calcifications are visible, a of Artificial Intelligence Research , 2:159–225, 1994.
difference by one calcification leads to a small Jaccard index, [8] J. Cohen. A Coefficient of Agreement for Nominal Scales. Educational
whereas the corresponding AC24 score is hardly affected. and Psychological Measurement, 20(1):37–46, April 1960.
[9] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Training
It is challenging to devise a segmentation algorithm which models of shape from sets of examples. In Proc. British Machine Vision
matches the accuracy of trained radiologists. One problem is Conference (BMVC 1992), pages 9–18. Springer-Verlag, 1992.
that standard radiographs contain an abundance of structures [10] M. de Bruijne and M. Nielsen. Shape particle filtering for image
segmentation. In C. Barillot, D. R. Haynor, and P. Hellier, editors,
with similar local appearance as calcifications. Contextual Proc. Medical Image Computing and Computer Assisted Intervention
knowledge is often indispensable for segmenting calcifications, (MICCAI 2004), volume 3216 of Lecture Notes in Computer Science,
but most prior assumptions are not sufficient for describing the pages 168–175. Springer, 2004.
[11] P. Del Moral, A. Doucet, and A. Jasra. Sequential monte carlo
full variation in the testing set. The calcification location prior, samplers. Journal of the Royal Statistical Society: Series B (Statistical
for instance, raises the expected segmentation accuracy, but Methodology), 68(3):411–436, June 2006.
fails in cases, where calcifications occur at unlikely locations. [12] A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo
methods in practice. Springer-Verlag New York, Inc., 2001.
To lower the dependence on a location and a spatial prior, we [13] L. Fischer, R. Donner, F. Kainberger, and G. Langs. Automatic
currently work on methods that learn data-dependent features. region template generation for shape particle filtering based image
We expect that these features will characterize the calcification segmentation. In Proc. Probabilistic Models for Medical Image Analysis
(PMMIA 2009), in conjunction with MICCAI 2009 , pages 289–300,
appearance better than the generic multiscale local jet features. 2009.
Another limitation is that the manual annotations can only [14] L. Florack, B. Ter Haar Romeny, M. Viergever, and J. Koenderink. The
be conceived as a guideline, rather than a ground truth. Ex- gaussian scale-space paradigm and the multiscale local jet. International
Journal of Computer Vision, 18:61–75, April 1996.
periments showed that the Jaccard index between two trained [15] M. Ganz, M. Nielsen, and S. S. Brandt. Patch-based generative shape
radiologists was only 0.51. To build a segmentation algorithm model and mdl model selection for statistical analysis of archipelagos.
that is potentially more precise than a trained radiologist, it In Proc. Machine Learning in Medical Imaging (MLMI), in conjunction
with MICCAI 2010, pages 34–41, 2010.
would be necessary to access a more reliable ’ground truth’, [16] H. K. Genant, C. Y. Wu, C. van Kuijk, and M. C. Nevitt. Vertebral
such as annotations from registered X-ray/CT scans. fracture assessment using a semiquantitative technique. Journal of bone
Although a trustworthy ground truth is required for im- and mineral research, 8(9):1137–1148, 1993.
[17] A. J. Gray, J. W. Kay, and D. M. Titterington. An empirical study of
proving the calcification segmentations, the main performance the simulation of various models used for images. IEEE Trans. Pattern
measure remains the prediction of CVD mortality. In this Analalysis and Machine Intelligence , 16:507–513, May 1994.
paper, we have presented an automated framework that is able [18] M. Hansson, S. Brandt, and P. Gudmundsson. Bayesian probability maps
for evaluation of cardiac ultrasound data. In Probabilistic Models for
to score standard radiographs, so as to discriminate high-risk Medical Image Analysis (PMMIA 2009), in conjunction with MICCAI
patients from others. Further developments of the segmentation 2009, 2009.
14