You are on page 1of 101

Automatic Brain Tumor segmentation

Problem: The task in this problem is to automatically detect the presence of tumors in MR images of the brain, and segment the abnormal pixels from the normal pixels. Traditionally, the task has tried to segment the metabolically active 'enhancing' area of the tumor, which appears hyper-intense in T1 weighted images after the injection of gadolinium. Several recent methods have focused on additionally segmenting non-enhancing regions, as well as tumors that may only partially enhance or do not enhance at all. Several recent methods have also focused on the related task of segmenting edema (swelling) associated with tumors. Segmentation of completely enhancing or 'border enhancing' tumors is a relatively easy problem, while more work is still required for the task of segmenting tumors that do not have these characteristics. This is an interesting problem, since it is a task that humans can learn to do very well, while developing algorithms to do the same task has proven challenging. Some of the challenges associated with this task include: Local Noise Inter-Slice Intensity Variations Partial Volume Artifacts Intra-Volume Intensity Inhomogeneity Inter-Volume Intensity Inhomogeneity Integration of multi-spectral (potentially unaligned) data Intensity Overlap between normal and abnormal areas Tumor heterogeneity Changes in normal structures observed due to the presence of tumors

Automated Tumour Segmentation, using Machine Learning


Most of our efforts to date have focused on locating tumour volumes within a patient's brain, based on a set of MR images -- a task known as 'tumour segmentation'. Our automated segmentation program (ASP) takes as input a 3 * k grid of images, where each image is an MR image of a patient's brain; see first 3 columns of Figure 1. (This shows only k=5 of the typically around k=21 slices taken.) Note all images in each row corresponds to the same axial slice of the patient, at some height, and the first 3 columns are (respectively) images that are weighted T1, T1c, (T1 weighted after injection of the gadolinium contrast agent), and T2. The ASP output is a 3D volume corresponding to its assessment of the Gross Tumour Volume (GTV), encoded as a sequence of k images; see the 4th column of Figure 1. Our goal is to produce an accurate volume, automatically -- i.e., without any human assistance

Figure 1: Each row corresponds to the same axial slice through a patient's brain. The first three columns are (resp.) T1, T1c, T2 weighted images, and the 4th is the region that ASP labels as GTV. ASP uses a classifier to label each voxel in the complete volume as either GTV or not, based on properties associated with that voxel. The obvious properties would be simply the 3 intensity values (T1,

T1c and T2). Unfortunately, this is not sufficient, as there are 'intensity triples' that are tumour in one part of the brain, but normal in another (Schmidt, 2005). We therefore need to find other properties for each voxel, besides the intensities, and then to find the most effective combination of these features.

Feature Sets
Other researchers have produced brain atlases (templates), that specify voxel-level properties of a typical brain - e.g., providing the probability of grey matter at location (18 down, 200 left, 52 back), as well as every other coordinate (Schmidt, 2005). This can be extremely useful, as it can help specify what T1/T1c/T2 intensities to expect at each location; large deviations from the expectation can help identify possible tumours. Of course, this assumes we can relate the locations in the patient image to locations in the template. Unfortunately, these locations will typically be different - e.g., position (18, 200, 52) in the template might correspond to, say, (19, 207, 55) in the patient - as the patient's head may not be the 'typical' size and shape, and moreover it may have been tilted when the image was captured. A linear transformation can adjust for such simple scaling and rotation; a more serious problem arises when large tumours distort the anatomy within the brain itself - e.g., by shrinking the ventricles, or pushing them to new locations; see Figure 2. We solved this by following the linear registration with a highly-regularized nonlinear morphing (Schmidt, 2005).

For each voxel, we augmented the 3 intensity values (T1, T1c and T2) with 15 other values, 12 based on various templates, and 3 based on an alignment-based measure, 'symmetry', which is the difference of intensity between current position and its reflection about the mid-saggital plane. (Note this requires us to identify this mid-saggital plane, which is 'x=0' in the aligned image but need not be 'x=0' in the initial images.) We then considered 4 different 'resolutions', to take into account some texture effects (see below), for a total of (3+12+3) 4 = 72 features; see Figure 3.

Figure 3: Each box shows one of the 72 features used by ASP system. That is, the label for the voxel at each location depends on 72 feature values, one from the corresponding location in these boxes.

It is also important for the intensities to 'mean the same thing' throughout the volume, and also across patients (see below). However, due to various artifacts (as well as differences in magnet strength, etc), a T1 intensity of 122 might correspond to 'normal white matter' at one location of a patient but to 'tumour' at another. We addressed this by a combination of noise reduction and intensity standardization steps, based on MIPAV. Figure 4 shows the results of these processes. At a high level, this requires Preprocessing Intensity Normalization to make intensities more consistent within and between volumes Spatial Normalization to allow comparisons between spatial locations in different modalities and with locations in the template/coordinates Segmentation Feature extraction to represent pixel-level measurements related to the image, coordinate system, and template; see 72features. Supervised Pixel Classification to Assign the pixels a class (tumour or normal) by combining the feature values Label Relaxation to correct misclassifications using spatial label dependencies

Figure 4: Results of pre-processing steps. Top row: original images (T1, T1c, T2). Middle row: after Noise Reduction. Bottom row: after intensity standardization.

Best Combination of Features -- Learning a Classifier


Using the ideas discussed above, we can assign a set of features to each voxel in the volume. The real challenge, however, is determining which combination of these features corresponds to GTV -- e.g., perhaps a voxel is part of a GTV if

(T1c intensity)>0.5 and (diff from symmetry) > 0.2 and (prob of grey-matter) < 0.3 or alternatively, maybe it is a GTV if 0.3 * (T1c intensity) + -0.1 * (diff from symmetry) + 13 * (prob of grey-matter ) - 6 is positive or some other variant. If we, or any of our colleagues, knew which combination was appropriate, we could just implement this function directly. Unfortunately, no one knows this best function. However, we do have access to many images of other patients, which experts have labeled -- i.e., explicitly identified which voxels are GTV and which are normal. The field of Machine Learning has provided a number of algorithms that can use these labeled images to learn a classifier, which can then assign a label to a new (unlabeled) voxel of a novel patient (Mitchell, 1997). In essence, these learners find the patterns in the data that identify when a voxel belongs to the class of GTV voxels, versus the class of normal voxels. The current version of ASP used a standard algorithm for learning a 'Support Vector Machine' (Cristianini and Shawe-Taylor, 2000); we found this worked effectively and is an accurate classifier (Zhang et al., 2004; Zijdenbos et al., 2002; Garcia and Morena, 2004. For 'inter-patient training' -- when the learner is trained on one set of patients, but then tested on a different patient -- we obtained an average Jaccard score of 0.732 over 11 patients, with a variety of tumours (grade 2 astrocytoma, anaplastic astrocytoma, glioblastoma multiforme, oligodendroglioma) at different stages of treatment, and based on two 1.5 T scanners; see (Schmidt et al., 2005). We see that this 'learning approach' is important - indeed, essential! - in this situation, where the 'correct' answer is not known. It also adds flexibility to the system, as it means we can produce a new segmentation system: the classifier learned depends on the labels of the training data. For the GTV-classifier, the expert labeled voxel as '1' if it is a GTV voxel, and '0' otherwise. If the expert, instead, labeled each training voxel with '1' if it was

'Tumour+Edema' area (and '0' otherwise), then the learner will produce a 'Tumour+Edema-classifier'. Our inter-patient score on this task was 0.77. See Figure 5.

Figure 5: The bottom row shows results of three different classifiers: gross tumour volume, enhancing region, and tumour+edema. Each classifier takes as input all 3 images (T1, T1c and T2) shown in the top row, as well as information based on various templates; see Figure 3

Notice this methodology also allows ASP to accommodate different magnet strengths or other contexts. We would just need to train on images produced by this alternative MRI, and then labeled by an expert. Our initial ASP system used only the features associated with a voxel to assign a label to that voxel. This means it does not directly exploit another source of information: the fact that there are correlations between spatially adjacent voxels, in that they tend to have the same label - e.g., if the voxel at (10, 21, 37) is a tumour, then we would anticipate that the voxel at (10, 21, 38) is tumour as well. (Actually the large-resolution features, based on a region around the current voxel, begin to address this, as neighbouring voxels will often have many very similar feature values. ASP also includes a post-processing step to reclassify small regions that are initially labeled as tumours, as normal. But these are indirect approaches.) The fact that in spatial domains, the feature values of neighbouring locations are not independent has led many researchers to explore 'Markov Random Fields' and 'Conditional Random Fields' (Lafferty et al., 2001), (Kumar and Hebert, 2003) for classification using spatial data. These methods attempt to correlate the labels of neighbouring locations. We have begun to experiment with an extension of these ideas, by implementing a random field system based on Support Vector Machines, SVRF. This has produced very encouraging results, improving on our prior results; see (Lee et al., 2005).

Proposed Future Work


Better Algorithms: We plan to continue exploring our SVRF approach, and consider other related 'random field' technologies. One issue, with all of these approaches, is efficiency: these systems require a great deal of time to run (as well as considerable time to train). We are now investigating more efficient versions of these algorithms.

Learning Classifiers for Anatomical Brain Structures


As noted above, as our system is able to train its classifier, we can use this approach to produce other classifiers; we need only provide images that have been (hand)labeled. We can therefore train a classifier to identify eyes, the brain stem or other anatomical features, and then use the resulting 'eye-classifier' to automatically identify the eye region in new images.

Integrating other type of data:


1. The current system uses only the axial slices. We plan to build a more accurate 3D model by also using coronal and sagittal slices from the MRI data. 2. We have access to other information, about the patient, the tumour history, patient records, histology, etc. We plan to find ways to incorporate this information into ASP, to

3.

4.

improve its segmentation accuracy. Fortunately, we anticipate that this should be fairly straightforward, as we are already using a probabilistic representation that can easily accommodate these various types of information - ranging from general information about the patient and tumour, to specific information about the specific voxels. While we have access to a number of MR images of brains with tumours, there are a much larger number of MR images of tumour-free brains (LONI). We have begun to use this information, to help our system better identify features and high-level textures that distinguish tumour from non-tumour. We also plan to use the other modalities, including Magnetic Resonance Spectroscopy (MRS), Positron Emission Tomography (PET), and Diffusion Tensor Imaging (DTI) -viewing their values at each voxel as a feature; this set of features will then be combined with the other features (Figure 3) and used to classify each voxel.

Software:
Our software is currently not publicly available. For more information on using the software, potential collaboration, and sharing data please contact us. For information about possible licensing opportunities

Oracle Segmentation Program The Oracle Segmentation Program was developed to allow expert segmentation of brain tumours. The program which functions as a client server interface also allows us a secure and convenient way to store and access our MRI data. Features include:

MRI data management based on patient-study-series-pathology information Visualization of different MRI modalities Visualization of axial, coronal, and sagittal slices Tumour and Edema overlays on all modalities Viewing multiple modalities simultaneously Management of expertly segmented volumes (number of slices segmented/still to be segmented) Creation of tumour and edema regions of interest Image histogram visualization Window/Level imaging controls Ability to zoom, rotate, and flip images Secure access to confidential material Editing (stretch, move, cut, split) regions of interest Create regions of interest using splines Slideshows of slices

Automatic Segmentation Visualization Program

Figure 1.1 Screenshot of the OSP tool as a human expert segments a tumour. The Automatic Segmentation Visualization Program was created to allow easy visualization of the results of our Automatic Brain Tumour Segmentation pipeline. This program allows the visualization of all the steps in our pipeline. It also provides a means to organize the large repository of data for convenient viewing. Features include: Managment of large amounts of MRI data Visualization of different MRI modalities Visualize each individual step of the pipeline separately View previous steps in addition to current step Zoom images Select modalities to visualize Transparent overlays of edema or tumour labels View image difference after each pipeline step Overlay aligned templates Jaccard measure calculation and confusion matrix calculation per slice or per volume.

Automatic Brain Tumour Segmentation Pipeline:


The process of automatic segmentation involves several preprocessing steps to regularize the data. These steps are necessary for the segmentor to be able to identify abnormalities due to physiology rather than noise in images or systematic MRI inconsistencies. The Automatic Segmentation Program consists of the following pipeline of steps that each dataset is put through:

Preprocessing:

Noise Reduction

2D Local Noise Reduction SUSAN Noise Reduction Filter Inter-Slice Intensity Variation Reduction Weighted Linear Regression Intensity Inhomogeneity Reduction Nonparametric Nonuniform intensity Normalization 3D Local Noise Reduction SUSAN Noise Reduction Filter

Spatial Registration Inter-Modality Coregistration Maximum Normalized Mutual Information Rigid-Body Transformation Linear Template Alignment Maximum a Posteriori 12-parameter Affine Transformation

Non-Linear Template Warping Maximum a Posteriori Combination of Basis Function Warps Spatial Interpolation High Order Polynomial -Spline Standardization

Intensity Standardization Template-Based Intensity Weighted Linear Regression

Segmentation Feature Extraction Image-Based Features Intensities, Textures, Normal Intensity Distances Coordinate-Based Features Spatial Tissue Probabilities, Brain Area Probability, Expected Spatial Intensities Registration-Based Features: Template Intensities, Bi-Lateral Symmetry Feature-Based Features Regional Characterizations of Image-, Coordinate-, and Registration-Based Features Classification Binary Pixel Support Vector Machine Classification

Relaxation Relaxation of Pixel Label Median Root Filter and Morphological Hole Filling

Segmenting Brain Tumors using Pseudo{Conditional Random Fields

1 Introduction Segmenting brain tumors in magnetic resonance (MR) images involves classifying each voxel as tumor or non-tumor [1{3]. This task, a prerequisite for treating brain cancer using radiation therapy, is typically done by hand by expert medical doctors, who nd this process laborious and timeconsuming. Replacing this manual eort with a good automated classier would save doctors time; the resulting labels may also be more accurate, or at least more consistent. We treat this as a binary classication task, using a classier to map each MR image voxel described as a vector of values x 2 <d to a bit y 2 f+1; 1g, corresponding to either tumor or non-tumor. We rst learn this classier from

a set of data instances fh xi; yi igi [4]. Here, we focus on probabilistic classiers that actually return a class likelihood value P( y = +1 j x ) 2 [0; 1] for each voxel; our classier can then return +1 (tumor) if P( y = +1 j x ) 0:5. In general, given an entire n m image, our classier will seek the most likely labeling over f1; +1gnm: Y() = argmaxY P(YjX) . (This use of probabilities distinguishes these approaches from many other segmentation approaches, such as those based on variational and level set techniques [5, 6].) Standard machine learners, such as Nave Bayes, logistic regression (LR), and support vector machines (SVMs), produce eective classiers in many domains [7, 8]. However, these algorithms assume that the individual instances are iid. This is appropriate if the instances correspond to, say, a patients in a study, as nding that one patient responds well to some treatment does not mean that the next patient will also respond well. However, this assumption is problematic in our current situation, where each instance corresponds to a voxel: Here, nding that one voxel is labeled a tumor strongly suggests that its neighbors will have a similar label; similarly non-tumor voxels tends to be next to other non-tumor voxels. Algorithms that assume the data is iid typically perform poorly when the

data is not, which is why these algorithms do relatively poorly at segmentation tasks. This has motivated researchers to apply Markov Random Fields (MRFs; [9]) and Conditional Random Fields (CRFs; [10]) to various segmentation tasks. These techniques are able to represent complex dependencies among data instances, giving them higher accuracy on the segmentation task than iid classiers [11, 12]. However, these random eld approaches are based on computationally intractable formulations. Although there are approximation techniques that can deal with these computational challenges, CRF variants such as Discriminative Random Fields (DRFs) and Support Vector Random Fields (SVRFs) still require computationally expensive learning procedures [11, 13]. In this paper, we present a novel supervised learning system, PCRF, that can eciently produce high-quality segmenters, incorporating spatial constraints among MR image voxels. PCRF can be viewed as a regularized iid discriminative classier that is rst trained assuming the data is iid; this makes the training computationally ecient. It then relaxes the iid assumption during inference, by including a regularizing term that uses the class labels and feature vectors of

neighboring voxels of a given voxel. We demonstrate that PCRF is robust and ecient by illustrating its performance at segmenting MR images of the brains of tumor patients. We show that PCRF is signicantly more accurate than the corresponding base iid classiers, and is signicantly more ecient that other random eld methods during training, while producing similar accuracy. Section 2 reviews related work, including random eld models. Section 3 introducing our framework and novel PCRF system. Section 4 presents experiments that empirically demonstrate the eciency and eectiveness of our model. 2 Background We view brain tumor segmentation on a 2D MR image as classifying each image voxel as either tumor or non-tumor. The challenge is nding the most likely conguration of (tumor vs. non-tumor) labels Y = (y1; y2; : : : ; yr) 2 f1; +1gr for the voxels of a 2D MR image X = (x1; x2; : : : ; xr), where each set ranges over the set of indices S of all voxels in the r = mn image, each yi 2 f1; +1g is the label for voxel i, and xi is the feature vector for voxel i. A pair-wise MRF is formulated as P(YjX) / P(Y;X) = 1 Z(X) exp0@Xi2S D(xi; yi) +Xi2S Xj2Ni

V (yi; yj)1A (1) where D(xi; yi) corresponds to the local log likelihood log(P( xi j yi )) of xi given a class label yi; V (yi; yj) is a potential function that explicitly encodes the dependencies between labels at i and its neighbor j, based on Ni, which is the set of voxels neighboring xi. Z(X) is a normalizing factor to make the formulation a probability distribution.We can read-o the MRF assumptions from Equation 1: that the voxels are conditionally independent given their class labels, and that spatial correlations are modelled based only on the labels of neighboring voxels (yi and yj) but not on the observations (xi and xj ). These factors limit the advantages of using MRFs to model spatial dependencies in MR images [11{13]. CRFs attempt to overcome these disadvantages by relaxing the conditional independence assumption and incorporating observations into the formulation of spatial dependencies. P(YjX) = 1 Z(X) exp0@Xi2S A(yi;X) +Xi2S Xj2Ni I(yi; yj ;X)1A (2) where A(yi;X) corresponds to the conditional probability distribution (while the MRF's D(xi; yi) corresponds to the log conditional probability), and the

I(yi; yj ;X) term incorporates observations of data instances (unlike MRF's V (yi; yj) which does not). The Discriminative Random Field (DRF) is a variant of the CRF that performs robustly in 2D image region classication problems [13]. The Support Vector Random Field (SVRF) is a modication of DRFs that address high dimensional feature vectors and imbalanced datasets eectively [11]. Unfortunately, DRFs and SVRFs are computationally expensive, especially during learning, as their computations are exponential in the number of data points. This is basically due to their need to compute the partition function, corresponding to the Z(X) in Equation 2. (Note that Gaussian assumption of MRF makes Z(X) in Equation 1 simpler.) This has led to many approximation methods, such as pseudo-likelihood, contrastive divergence, and pseudo-marginal approximation [10, 13, 14, 12]. Unfortunately these approximations reduce the accuracy of the learned segementor. This motivated Decoupled Conditional Random Fields (DCRFs [15]), which speed up the CRFbased computation by approximating a CRF as the combination of two classiers that are each trained separately. As the DCRF framework searches for the parameter values that optimize each model separately, the combined parameter values are not necessarily

globally optimal. 3 Pseudo Conditional Random Fields { PCRFs The PCRF framework attempts to obtain the advantage of both the MRF and CRF approaches by relaxing the iid assumption of a simple discriminative classier by adding a regularization term. We want to nd the most-likely labelling P(YjX) = Qi2S P( yi jX;Y yi ). Given feature vectors (observations) xi for each voxel i as well as the class labels yNi over neighboring voxels j 2 Ni, the PCRF formulation denes P( yi j xi; xNi ; yNi ) = (xi; yi) Yj2Ni o(xi; xj) c(yi; yj); (3) where the potential functions o(xi; xj) quanties the similarity of the feature vectors for voxels i and j, and c(yi; yj) models the interactions between the two class labels yi and yj . We can adjust c(:) to alter the degree of continuity with respect to class labels expected by the model; e.g., if we set c to give high weight when neighboring voxels share the same class label, then the resulting PCRF will prefer having the same class labels among neighboring voxels. Alternatively, setting o 1 and c 1 would remove all spatial dependencies, leading to an iid classier. Note we use a xed pair of potential functions: here we set o(xi; xj) = xT i xj , as the similarity measure between neighboring voxels; note

this measure is maximum value when the two vectors are co-linear. We also set c(yi; yj) = if yi yj , and 1 otherwise, where weighs the continuity of identical class labels. Here we used = 0:6. For now, we dene (xi; yi) = (T xi) = 1 1+exp(T xi) as a simple logistic regression classier.We chose a discriminative approach rather than a generative one because the former empirically shows better performance than the latter [8]. Learning Learning the PCRF parameters is more ecient than for other CRF variants as a PCRF needs to t only the parameter vector for a local potential function (:), which does not involve any spatial interactions. Here, we use the standard way to maximize the conditional log-likelihood () = arg maxPi2S hyi log (T xi) + (1 yi) log(1 (T xi))i: Inference The PCRF inference process incorporates regularization based on neighbor relationships. In general, the objective of inference is to maximize the log likelihood: Y = arg max Y log P(YjX) = arg max Y Xi2S log (xi; yi) + Xi2S Xj2Ni log c(xi; xj) + log o(yi; yj) (4)

The graph cuts algorithm solves image pixel classication tasks by minimizing an energy function when spatial correlations among pixels are independent of the observations; this involves using linear programming to nd the max-ow/mincut on a graph whose nodes correspond to voxels and edges correspond to connections between neighboring voxels [16]. We reformulate this graph cuts approach to apply to our PCRF framework (Equation 4), where neighbor relationships are dependent on both the labels and the observations (feature vectors). 4 Brain Tumor Segmentation We applied our PCRF model to the challenging real world problem of segmenting brain tumors in MR images. Since a PCRF can be viewed as a regularized discriminative iid classier, we rst show the dierences between PCRF and its degenerate iid classier { LR. To quantify the performance of each model, we used the percentage Jaccard score J = 100 TP (TP+FP+FN) , where TP denotes the number of true positives, FP false positives, and FN false negatives, taken over the entire image. We used this score for brain tumor segmentation task as this data is very imbalanced in that only a small percentage of voxels are in the \tumor" class; hence scores like

\accuracy" would be high as the \true negative" class is typically huge. We applied several dierent models { LR, PCRF, SVRF { to the task of classifying MR image slices, where each slice is dened with 258 by 258 pixels, each of which is described using 33 features [17]. We considered data from 11 patients with brain tumors; for each patient, we annotated each voxel with values based on three dierent MR imaging modalities: T1, T2, and T1 with gadolinium contrast(\T1c").We focus on 2D images; this is sucient to illustrate the challenges as the neighborhood structure here involves cycles, which makes both inference and learning procedures computationally challenging5. Testing and training were done in a patient-specic manner: for each patient, each algorithm was trained on a subset of the patient's data, then tested on another (disjoint) subset. This is similar to the approach taken in many other studies of automatic brain tumor segmentation such as [18{21]. Our systems attempted to segment the \enhancing" tumor area | the region that appears bright on T1c images. Note that it is not sucient to simply threshold T1c images by \brightness" because other tissues can have the same range of intensities. In the case of glioblastomas with necrotic cores, which ap-

pear dark on T1 images, we dened the enhancing rim of the tumor as well as the dark necrotic core as the target tumor region. Fig. 1 shows one example of segmentation results. One test and its correct label ("ground truth") slice are shown in rst two columns respectively. The result from LR, shown in third from Fig. 1, indicates that LR correctly classies the tumor region but that it also misclassies several small non-tumor regions 5 We are beginning to explore extending this approach to 3D, which involves simply redening the neighborhood structure. (a) Testing Slice (b) Ground Truth (c) LR(J=66.45) (d) PCRF(J=71.11) Fig. 1. Classication results. The PCRF shows almost 4% improvement of Jaccard score over LR. 65 70 75 80 85 65 70 75 80 85 PCRF LR PCRF vs LR (a) PCRF vs. LR 82 84 86 88 90 92 82 84 86 88

90 92 PCRF(SVM) SVRF PCRF(SVM) vs SVRF (b) PCRF(SVM) vs. SVRF Fig. 2. Jaccard Scores (percentage) as \tumor". PCRF's result, which appears on the far right, is more accurate. (See [22] for the complete set of larger images.) Fig. 2(a) presents the Jaccard percentage scores from the 11 studies, where points above the diagonal line denote instances in which the PCRF performed better than its degenerative model, LR. Overall, the PCRF's accuracy was statistically signicantly higher than LR's at p < 0:005 on a paired sample t-test. We also compared our PCRF system with the stateof-the-art CRF variant, the Support Vector Random Field (SVRF [11]), whose potential functions are based on Support Vector Machines (SVMs). Here, we implemented PCRF(SVM), which diered from the PCRF system only by using an SVM to compute the (x; y) (from Equation 4) which models the relationship between a voxel's feature vector and its label. An SVM produces the distance between a hyperplane and a data instance as its decision value fSVM(xi) 2 (1;+1). To normalize this unbounded range, we t this value to a sigmoid function:

g0;1 (x) = P( y = +1 j x ) = 1 1+exp(0+1(x)) , estimating the parameters 0 and 1 from the training data f(fSVM(xi); yi)gi [11]. Figure 2(b) compares the percentage Jaccard scores of PCRF(SVM)6 vs SVRF. It is clear that PCRF(SVM) is comparable with SVRF. We next considered the timing. As our PCRF did not need to learn parameters for its spatial correlation model, we anticipated it would be signicantly faster during the learning stage. The learning times (average across 11 patients, in seconds) conrm this: DRF SVRF DCRF PCRF Tumor segmentation 1697 1276 63 38 Our PCRF was over 40 times faster than the DRF and over 30 times faster than the SVRF (p < 1037 and p < 1029, pairedsamples t-tests for DRFs and SVRFs, respectively). Even DCRF, known as the fastest CRF variant, is signicantly slower than our PCRF (p < 1026).

5 Conclusion We found that the PCRF(SVM) system, which uses a linear SVM to map from voxel to label, worked eectively.We might be able to obtain further performance

improvements by using a non-linear kernel function. In addition, we might be able to produce a more robust model by incorporating a prior P( ) over to further reduce the possibility of overtting. We are extending this work to develop eective systems to overcome the limitations of patient-specic training, by taking advantage of semi-supervised learning principles. Contributions This paper has presented the Pseudo Conditional Random Field (PCRF) model, a CRF-inspired formulation that incorporates a specied potential function to model the relationships between neighboring voxels. Our PCRF is fast to train as it does not need to t parameters that model the neighbor relationships. It can be viewed as a regularized iid classier, whose classication decisions for each pixel involve the labels and features of neighboring voxels. Thus, during inference, PCRF avoids the iid assumption, which is inappropriate for image segmentation tasks.We demonstrate that PCRF is eective by showing it can eectively segment brain tumors from MR images, achieving state-of-theart segmentation results, but at a small fraction of the training time.

1 Introduction Magnetic Resonance Imaging (MRI) has become a widelyused method of high quality medical imaging, especially in brain imaging where MRI's soft tissue contrast and noninvasiveness are clear advantages. An important use of MRI data is tracking the size of brain tumor as it responds (or doesn't) to treatment [1, 2]. Therefore, an automatic and reliable method for segmenting tumor would be a useful tool [3]. Currently, however, there is no method widely accepted in clinical practice for quantitating tumor volumes from MR images [4]. The Eastern Cooperative Oncology group [5] uses an approximation of tumor area in the single MR slice with the largest contiguous, well-de_ned tumor. Signi_cant variability across observers can be found in these estimations, however, and such an approach can miss tumor growth/shrinkage trends [6, 2]. Computer-based brain tumor segmentation has remained largely experimental work. Many e_orts have exploited MRI's multi-dimensional data capability through multi-spectral

analysis [7, 8, 9, 10, 11, 12]. Arti_cial neural networks have also been explored [13, 14, 15]. Others have introduced knowledge-based techniques to make more intelligent classi_cation and segmentation decisions, such as in [16, 17] where fuzzy rules are applied to make initial classi_cation decisions, then clustering (initialized by the fuzzy rules) is used to classify the remaining pixels. More explicit knowledge has been used in the form of frames [18] or tissue models [19, 20]. Our e_orts in [21, 22] showed that a combination of knowledge-based techniques and multi-spectral analysis (in the form of unsupervised fuzzy clustering) could e_ectively detect pathology and label normal transaxial slices intersecting the ventricles. In [23], we expanded this system to detect pathology and label normal brain tissues in partial brain volumes located above the ventricles. Most reports on MR segmentation [24], however, have either dealt with normal data sets, or with neuro-psychiatric disorders with MR distribution characteristics similar to normals. In this paper, we describe a system that addresses the more di_cult task of extracting tumor from transaxial MR images over a period of time during which the tumor 2 is treated. Each slice is classi_ed as abnormal by our system described in [23]. Of the tumor types that are found in the brain, glioblastoma-multiformes (Grade IV Gliomas) are the focus of this work. This tumor type was addressed _rst because of its relative compactness

and tendency to enhance well with paramagnetic substances, such as gadolinium. Using knowledge gained during \pre-processing" by our system in [23], extra-cranial tissues (air, bone, skin, fat, muscles, etc.) are _rst removed based on the segmentation created by a fuzzy c-means clustering algorithm [25, 26]. The remaining pixels (really voxels since they have thickness) form an intra-cranial mask. An expert system uses information from multi-spectral and local statistical analysis to _rst separate suspected tumor from the rest of the intra-cranial mask, then re_ne the segmentation into a _nal set of regions containing tumor. A rule-based expert system shell, CLIPS [27, 28], is used to organize the system. Low level modules for image processing and high level modules for image analysis are all written in C and called as actions from the right hand sides of the rules. The system described in this paper provides a completely automatic (no human intervention on a per volume basis) segmentation and labeling of tumor after a rule set was built from a set of \training images". For the purposes of tumor volume tracking, segmentations from contiguous slices (within the same volume) are merged to calculate total tumor size in 3D. The tumor segmentation matches well with radiologistlabeled \ground truth" images and is comparable to results generated by a supervised segmentation technique. The remainder of the paper is divided into four sections. Section 2 discusses the slices

processed and gives a brief overview of the system. Section 3 details the system's the major processing stages and the knowledge used at each stage. The last two sections present the experimental results, an analysis of them, and future directions for this work. 3 2 Domain Background 2.1 Slices of Interest for the Study The system described here can process any transaxial slice [29, 30] (intersecting the long axis of the human body) starting from an initial slice 7 to 8 cm from the top of the brain and upward. This range of slices provides a good starting point in tumor segmentation, due to the relatively good signal uniformity within the MR coil used [23]. Each brain slice consists of three feature images: T1-weighted (T1), proton density weighted (PD), and T2-weighted (T2) [3] 1. An example of a normal slice after segmentation is shown in Figures 1(a) and (b). Figures 1(c) and (d) show an abnormal slice through the ventricles, though pathology may exist within any given slice. The labeled normal intra-cranial tissues of interest are: CSF (dark gray) and the parenchymal tissues, white matter (white) and gray matter (black). In the abnormal slice, pathology (light gray) occupies an area that would otherwise belong to normal tissues. In the approach described here, only part of the pathology (gadoliniumenhanced tumor) is identi_ed and labeled. A total of 120 slices containing radiologist diagnosed glioblastoma-multiforme tumor

were available for processing. Table 1 lists the distribution of these slices across sixteen volumes of seven patients who received varying levels of treatment, including surgery, radiation therapy, and chemo-therapy prior to initial acquisition and between subsequent acquisitions. Using a criteria of tumor size (per slice) and level of gadolinium enhancement to capture the required characteristics of all data sets acquired with this protocol, a training 1Each slice has a thickness of 5mm with no gap between consecutive slices, _eld of view of 240mm (pixel size 0.94 mm and image size 256x256 pixels), and T1weighted:TR/TE of 600/11ms (spin echo), PD-weighted:TR/TE of 4000/17ms (fast spin echo), and T2weighted:TR/TE of 4000/102ms (fast spin echo). All slices show gadolinium (Magnevist) enhancement, with a concentration of 0.1 mmol/kg and were acquired using in a 1.5 Tesla General Electric imaging coil. Signal uniformity was measured according to AAPM standards [31], with a cylindrical phantom with a diameter of 8 inches which was imaged with a _eld-of-view of 270 mm. To measure the worst-case nonuniformity, no smoothing was applied. Nonuniformity was measured for each transaxial plane, and resulted in values between 89% and 94% for allimage sequences. No gradients in signal intensity were observed in the data sets, nor was any within slice non-uniformity. All imaging was performed postcontrast, avoiding any registration problems. The MR scanner provides 12-bit data which was used without further scaling. 4 (a) Ventricles

(b) (c) Pathology (d) Figure 1: Slices of Interest: (a) raw data from a normal slice (T1-weighted, PD and T2weighted images from left to right) (b) after segmentation (c) raw data from an abnormal slice (T1-weighted, PD and T2-weighted images from left to right) (d) after segmentation. White=white matter; Black=gray matter; Dark Gray=CSF; Light Gray=Pathology in (b) and (d). subset of seventeen slices was created. The heuristics discussed in Section 3 were extracted from the training subset through the process of \knowledge engineering." Knowledge engineering is not automated, but human directed. Heuristics are expressed in general terms, such as \higher end of the T1 spectrum" (which does not specify an actual T1 value). This provides knowledge that is more robust across slices, without regard to a slice's particular thickness, scanning protocol, or signal intensity, as was the case in [23]. In contrast, multi-spectral e_orts such as [32] tune imaging parameters, which may limit their application to slices with the same parameters. The generality of the system will be discussed in Section 5. 2.2 Knowledge-Based Systems Knowledge is any chunk of information that e_ectively discriminates one class type from another [28]. In this case, tumor will have certain properties that other brain tissues will

5 Table 1: MR Slice Distribution. Parenthesis indicate the number of slices from that volume that were used as training. # Slices Extracted from Volume Pat Baseline Repeat 1 Repeat 2 Repeat 3 Repeat 4 2 8 9(9) 9 - 4 6 7 7(2) - 5 6(6) - - - 1 9 10 10 9 8 399--63---71---not and visa-versa. In the domain of MRI volumes, there are two primary sources of knowledge available. The _rst is pixel intensity in feature space, which describes tissue characteristics within the MR imaging system, which are summarized in Table 2 (based on a review of literature [33, 34, 35]). The second is image/anatomical space and includes expected shapes and placements of certain tissues within the MR image, such as the fact that CSF lies within the ventricles, as shown in Figure 1(a). Our previous e_orts in [21, 22, 23] exploited both feature-domain and anatomical knowledge, using one source to verify decisions based on the other source. The nature of tumors limits the use of anatomical knowledge, since they can have any shape and occupy any area within the brain. As a result, knowledge contained in feature space must be extracted and utilized in a number of novel ways. As each processing stage is described in Section 3, the speci_c

knowledge extracted and its application will be detailed. 2.3 System Overview A strength of the knowledge-based (KB) systems in [21, 22, 23] has been their \coarseto-_ne" operation. Instead of attempting to achieve their task in one step, incremental re_nement is applied with easily identi_able tissues located and labeled _rst. Removing labeled pixels from further consideration allows a \focus" to be placed on the remaining Raw MR image data: T1, PD, and T2-weighted images. Radiologists hand labeled ground truth tumor. Tumor segmentation refined using density screening. STAGE THREE Initial tumor segmentation using adaptive histogram thresholds on intracranial mask. STAGE TWO Intracranial mask created from initial segmentation. STAGE 0 STAGE ONE Pathology Recognition. Normal tissues are located and tested. Slices with abnormalities (such as in the white matter class shown) are segmented for tumor. Slices without abnormalities are not processed further.

Initial segmentation by unsupervised clustering algorithm. White matter class. Removal of spatial regions that do not contain tumor. Remaining regions are labeled tumor and processing halts. STAGE FOUR Figure 2: System Overview. 7 Table 2: A Synopsis of T1, PD, and T2 E_ects on the Magnetic Resonance Image. TR=Repetition Time; TE=Echo Time. Pulse Sequence E_ect Tissues (TR/TE) (Signal Intensity) T1-weighted Short T1 relaxation Fat, Lipid-Containing Molecules, (short/short) (bright) Proteinaceous Fluid, Paramagnetic Substances (Gadolinium) Long T1 relaxation Neoplasm, Edema, CSF, (dark) Pure Fluid, Inammation PD-weighted High proton density Fat, Fluids (long/short) (bright) Low proton density Calcium, Air, (dark) Fibrous Tissue, Cortical Bone T2-weighted Short T2 relaxation Iron containing substances (long/long) (dark) (blood-breakdown products) Long T2 relaxation Neoplasm, Edema, CSF, (bright) Pure Fluid, Inammation (fewer) pixels, where more subtle trends may become clearer. The tumor segmentation

system is similarly designed. To better illustrate the system's organization, we present it at a conceptual level. Figure 2 shows the primary steps in extracting tumor from raw MR data. Section 3 described these steps in more detail. The system has _ve primary steps. First a pre-processing stage developed in previous works [21, 22, 23], called Stage Zero here, is used to detect deviations from expected properties within the slice. Slices that are free of abnormalities are not processed further. Otherwise, Stage One extracts the intra-cranial region from the rest of the MR image based on information provided by pre-processing. This creates an image mask of the brain that limits processing in Stage Two to only those pixels contained by the mask. In fact, a particular Stage operates only on the foreground pixels that are contained in a mask produced by the completion of the previous Stage. An initial tumor segmentation is produced in Stage Two through a combination of adaptive histogram thresholds in the T1 and PD feature images. The initial tumor segmentation is passed on to Stage Three, where additional non-tumor pixels are removed via 8 a \density screening" operation. Density screening is based on the observation that pixels of normal tissues are grouped more closely together in feature space than tumor pixels. Stage Four completes tumor segmentation by analyzing each spatially disjoint \region" in image space separately. Regions found to be free of tumor are removed, with those

regions remaining labeled as tumor. The resulting image is considered the _nal tumor segmentation and can be compared with a ground truth image. 3 Classi_cation Stages 3.1 Stage Zero: Pathology Detection All slices processed by the tumor segmentation system have been automatically classi_ed as abnormal. They are known to contain glioblastomamultiforme tumor based on radiologist pathology reports. Since this work is an extension of previous work, knowledge generated during \pre-processing" is available to the tumor segmentation system. Detailed information can be found in [21, 22, 23], but a brief summary is provided. Slice processing begins by using an unsupervised fuzzy cmeans (FCM) clustering algorithm [25, 26] to segment the slice. The initial FCM segmentation is passed to an expert system which uses a combination of knowledge concerning cluster distribution in feature space and anatomical information to classify the slice as normal or abnormal. Two examples of knowledge (implemented as rules) used in the predecessor system are: (1) in a normal slice, CSF belongs to the cluster center with the highest T2 value in the intracranial region; (2) in image space, all normal tissues are roughly symmetrical along the vertical axis (de_ned by each tissue having approximately the same number of pixels in each brain hemisphere), while tumors often have poor symmetry. Abnormal slices are

detected by their deviation from \expectations" concerning normal MR slices, such as the one shown in Figure 2 whose white matter class failed to completely enclose the ventricle area. An abnormal slice with the facts generated in labeling it abnormal are passed on to the tumor segmentation system. Normal slices have all pixels labeled. 9 (a) (b) (c) (d) (e) Figure 3: Building the Intra-Cranial Mask. (a) The original FCM-segmented image; (b) pathology captured in Group 1 clusters; (c) intra-cranial mask using only Group 2 clusters; (d) mask after including Group 1 clusters with tumor; (e) mask after extra-cranial regions are removed. (a) (b) (c) Figure 4: (a) Initial segmented image; (b) a quadrangle overlaid on (a); (c) classes that passed quadrangle test. 3.2 Stage One: Building the Intra-Cranial Mask The _rst step in the system presented here is to isolate the intra-cranial region from the rest of the image. During pre-processing, extra and intracranial pixels were distinguished primarily by separating the clusters from the initial FCM segmentation into two groups: Group 2 for brain tissue clusters, and Group 1 for the remaining extra-cranial clusters. Occasionally, enhancing tumor pixels can be placed into one or more Group 1 clusters with high T1-weighted centroids. In most cases, these pixels can be reclaimed through a

series of morphological operations (described below). As shown in Figures 3(b) and (c), however, the tumor loss may be too severe to recover morphologically without distorting the intra-cranial mask. 10 Group 1 clusters with signi_cant \Lost Tumor" can be located, however. During preprocessing, Group 1 and 2 clusters were separated based on the observation that extracranial tissues surround the brain and are not found within the brain itself. A \quadrangle" was developed by Li in [21, 36] to roughly approximate the intra-cranial region. Group 1 and 2 clusters were then discriminated by counting the number of pixels a cluster had within the quadrangle. Clusters consisting of extra-cranial tissues will have very few pixels inside this estimated brain, while clusters of intra-cranial tissues will have a signi_cant number. An example is shown in Figure 4. A Group 1 cluster is considered to have \Lost Tumor" here if more than 1% of its pixels were contained in the approximated intra-cranial region. The value of 1% is used to maximize the recovery of lost tumor pixels because extracranial clusters with no lost tumor will have very few pixels within the quadrangle, if any at all. Pixels belonging to Lost Tumor clusters (Figure 3(b)) are merged with pixels from all Group 2 clusters (Figure 3(c)) and set to foreground (a non-zero value), with all other pixels in the image set to background (value=0). This produces a new intracranial mask similar to the one

shown in Figure 3(d). Since a Lost Tumor cluster is primarily extra-cranial, its inclusion in the intra-cranial mask introduces areas of extra-cranial tissues, such as the eyes and skin/fat/muscle. To remove these unwanted extra-cranial regions (and recover smaller areas of lost tumor, mentioned above), a series of morphological operations [37] are applied, which use window sizes that are the smallest possible (to minimize mask distortion) while still producing the desired result. Small regions of extra-cranial pixels are removed and separation of the brain from meningial tissues is enhanced by applying a 5_5 closing operation to the background. Then the brain is extracted by applying an eight-wise connected components operation [37] and keeping only the largest foreground component (the intracranial mask). Finally, \gaps" along the periphery of the intra-cranial mask are _lled by _rst applying a 15 _ 15 closing, 11 then a 3 _ 3 erosion operation. An example of the _nal intracranial mask can be seen in Figure 3(e). 3.3 Stage Two: Multi-spectral Histogram Thresholding Given an intra-cranial mask from Stage One, there are three primary tissue types: pathology (which can include gadolinium-enhanced tumor, edema, and necrosis), the brain parenchyma (white and gray matter), and CSF. We would like to remove as many pixels belonging to normal tissues as possible from the mask.

Each MR voxel of interest has a hT1; PD;T2i location in <3, forming a feature-space distribution. Based on the knowledge in Table 2, and the fact that pixels belonging to the same tissue type will exhibit similar relaxation behaviors (T1 and T2) and water content (PD), they will then also have approximately the same location in feature space [38]. Figure 5(a) shows the signal-intensity images of a typical slice, while (b) and (c) show histogram for the bivariate features T1/PD and T2/PD, respectively, with approximate tissue labels overlaid. There is some overlap between classes because the graphs are projections and also due to \partial-averaging" where di_erent tissue types are quantized into the same voxel. The typical relationships between enhancing tumor and other brain tissues can also be seen in Figure 6, which are histograms for each of the three feature images. These distributions were examined and interviews were conducted with experts concerning the general makeup of tumorous tissue, and the behavior of gadolinium enhancement in the three MRI protocols. From these sources, a set of heuristics were extracted that could be included in the system's knowledge base: 1. Gadolinium-enhanced tumor pixels occupy the higher end of the T1 spectrum. 2. Gadolinium-enhanced tumor pixels occupy the higher end of the PD spectrum, though not with the degree of separation found in T1 space [39]. 12

(a) T1-Weighted Value PD-Weighted Value C Pa Pa Pa T T T High PD High T1 (b) PD-Weighted Value T2-Weighted Value C Pa Pa Pa T T High T2 High PD (c) Figure 5: (a) Raw T1, PD, and T2-weighted Data. The distribution of intra-cranial pixels are shown in (b) T1-PD and (c) PD-T2 feature space. C = CSF, Pa = Parenchymal Tissues, T = Tumor 3. Gadolinium-enhanced tumor pixels were generally found in the \middle" of the T2 spectrum, making segmentation based on T2 values di_cult. 4. Slices with greater enhancement had better separation between tumor and non-tumor

pixels, while less enhancement resulted in more overlap between tissue types. Analysis of these heuristics revealed that histogram thresholding could provide a simple, yet e_ective, mechanism for gross separation of tumor from non-tumor pixels (and thereby an implementation for the heuristics). In fact, in the T1 and PD spectrums, the signal intensity having the greatest number of pixels, that is, the histogram \peaks," were found to be e_ective thresholds that work across slices, even those with varying degrees of 13 (a) Raw Data Low T1 T1-Weighted Value High T1 Pixel Count Intracranial Pixels "Ground Truth" Tumor (b) T1-weighted Histogram Low PD PD-Weighted Value High PD Pixel Count Intracranial Pixels "Ground Truth" Tumor (c) PD-weighted Histogram Low T2 T2-Weighted Value High T2 Pixel Count Intracranial Pixels "Ground Truth" Tumor (d) T2-Weighted Histogram Figure 6: Histograms for Tumor and the Intra-Cranial Region. Solid black lines indicates

thresholds in T1 and PD-weighted space. 14 (a) (b) (c) (d) Figure 7: Multi-spectral Histogram Thresholding of Figure 6. (a) T1-weighted thresholding; (b) PD-weighted thresholding; (c) Intersection of (a) and (b); (d) Ground truth. gadolinium enhancement. An example of this is shown in Figure 6. The T2 image had no such property that was consistent across all training slices and was excluded. For a pixel to survive thresholding, its signal intensity value in a particular feature had to be greater than the intensity threshold for that feature. Figures 7(a) and (b) show the results of applying the T1 and PD histogram \peak" thresholds in Figures 6(b) and (c). In both of these thresholded images a signi_cant number of non-tumor pixels have been removed, but some non-tumor pixels remain in each thresholded image. Since the heuristics listed above state that gadolinium enhanced tumor has a high signal intensity in both the T1 and PD features, additional non-tumor pixels can be removed by intersecting the two images (where a pixel remains only if it's present in both images). An example is shown in Figure 7(c). 3.4 Stage Three: \Density Screening" in Feature Space The thresholding process in Stage Two provides a good initial tumor segmentation, such as the one shown in Figure 7(c). Comparing it with the ground truth image Figure 7(d), a number of pixels in the initial tumor segmentation are not found in the ground truth

image and should be removed. Additional thresholding is di_cult to perform, however, without possibly removing tumor as well as non-tumor pixels. 15 Low PD High Low PD T1 High T1 Number of Pixels Low High (a) 2D-Histogram Projection Low PD Low T1 High T1 High PD (b) Scatterplot Before Screening Low T1 High T1 Low PD High PD (c) Scatterplot After Screening (d) Initial Tumor (e) Removed Pixels (Black) (f) Ground Truth Figure 8: Density Screening Initial Tumor Segmentation From Figure 7(c). Pixels belonging to the same tissue type will have similar signal intensities in the three feature spectrums. Because normal tissue types have a more or less uniform cellular makeup [33, 34, 35], their distribution in feature space will be relatively concentrated [38]. In contrast, tumor can have signi_cant variance, depending on the local degrees of enhancement and tissue inhomogeneity within the tumor due to the presence of edema, necrosis,

and possibly some parenchymal cells captured by the partial-volume e_ect. Figures 5 (b) and (c) show the di_erent spreads in feature space for normal and tumor pixels. Pixels belonging to parenchymal tissues and CSF are grouped more densely by intensity, while pixels belonging to tumor are more widely distributed. By exploiting this \density" property, non-tumor pixels can be removed without a_ecting the presence of tumor pixels. Called \density screening," the process begins by creating 16 a 3-dimensional histogram for all pixels remaining in the initial tumor segmentation image after thresholding. The histogram array itself has a T1 range _ PD range _ T2 range size of 128 _ 128 _ 128 intensity bins. For each feature, the maximum and minimum signal intensity values in the initial tumor segmentation are found and quantized into the histogram array (i.e., the minimum T1 intensity value occupies T1 Bin 1, the maximum T1 intensity value occupies T1 Bin 128), with all T1 values in between \quantized" into one of the 128 bins. This quantization was done for two reasons. First, sizes of a threedimensional histogram quickly became prohibitively large to store and manipulate. Even a 2563 histogram has nearly 17 million elements. Secondly, levels of quantization can make the \dense" nature of normal pixels more apparent while still leaving tumor pixels relatively spread out. For the 12-bit data studied here, after thresholding, slices had a range of

approximately 800 intensity values in each feature. The actual value of 128 was empirically selected. Using 64 bins blurred the separation of tumor and non-tumor pixels in training slices where the tumor boundary was not as well de_ned. Values similar to 128, such as 120 or 140, are unlikely to signi_cantly change the \quantization" e_ect and should yield similar results. The histograms and scatterplots shown in Figure 8 were created using 128 bins. From the 3D histogram, three 2D projections are generated: T1/PD, T1/T2, and PD/T2. An example 2D projection is shown in Figure 8(a), which was generated from the slice shown in Figure 7(c). A corresponding scatterplot is shown in Figure 8(b). The bins with the most pixels (the highest \peaks" in Figure 8(a)) can be seen in the lowest T1/PD corner and are comprised of non-tumorous pixels that should be removed. In contrast, tumor pixels, while greater in number, are more widespread and have lower peaks in their bins. In each projection, the highest peak is found and designated as the starting point for a region growing [40] process that will \clear" any neighboring bin whose cardinality (number of pixels in that bin) is greater than a set threshold (T1/PD=3, T1/T2=4, PD/T2=3). 17 This will result in a new scatterplot similar to that shown in Figure 8(c). A pixel is removed from the tumor segmentation if it corresponds to a bin that has been \cleared" in any of

the three feature-domain projections. Figures 8(d) and (e) are the tumor segmentation before and after the entire density screening process is completed. Note that the resulting image is closer to ground truth. The thresholds used were determined from training slices by creating a 3D histogram, including 2D projections, using only pixels contained in the initial tumor segmentation. Then the ground truth tumor pixels for each slice were overlaid on the respective projections. So, given a 3D histogram of an initial tumor segmentation, all pixels not in the ground truth image are removed, leaving only tumor behind without changing the dimensions and quantization levels of the histogram. The respective 2D projections of all training slices were examined. It was found that the smallest bin cardinality bordering a bin occupied by known non-tumor pixels made an accurate threshold for the given projection. It should be noted, however, that the thresholds were based on the 256_256 images used in this research and would need to be scaled to accommodate images of di_erent sizes, such as 512 _ 512. 3.5 Stage Four: Region Analysis and Labeling In Stages Two and Three, the knowledge extracted up to this point was applied to pixels individually. Stage Four, allows spatial information to be introduced by considering pixels on a region or component level. Applying an eight-wise connected components operation [37] to the re_ned tumor seg-

mentation generated by Stage Three, allows each region to be tested separately for the presence of tumor. An example is shown in Figure 9. After processing the intra-cranial mask shown in Figure 9(a) in Stages Two and Three, a re_ned tumor segmentation (b) is produced. The segmentation shows a number of spatially disjoint areas, but ground 18 (a) (b) (c) Figure 9: Regions in Image Space. After processing the intra-cranial mask (a), (b) is an initial tumor segmentation. Only one region, as shown in the ground-truth image (c) is actual tumor. Region analysis discriminates between tumorous and non-tumorous regions. truth tumor in Figure 9(c) shows that only one region actually contains tumor. Therefore, decisions must be made regarding which regions contain tumor and which do not. 3.5.1 Removing Meningial Regions In addition to tumor, meningial tissues immediately surrounding the brain, such as the dura or pia mater, receive gadolinium infused blood. As a result they can have a high T1 signal intensity that may interfere with the knowledge base's assumption in Section 3.5.2 that regions with the highest T1 value are most likely tumor. These extra-cranial tissues can be identi_ed and removed via anatomical knowledge by noting that since they are thin membranes, meningial regions should lie along the periphery of the brain in a relatively narrow margin.

Figure 10 shows that an approximation of the brain periphery can be used to detect meningial tissues. The unusual shape of the intra-cranial region is due to prior resection surgery. The periphery is created by applying a 7_7 erosion operation to the intra-cranial mask and subtracting the resultant image from the original mask, as shown in Figure 10(ac). Each component or separate region in the re_ned tumor mask is now intersected with the brain periphery. Any region which has more than 50% of its pixels contained in the periphery is marked as meningial tissue and removed. Figure 10(d) shows a tumor segmentation which is intersected with the periphery from Figure 10(c). In Figure 10(e), 19 (a) (b) (c) (d) (e) Figure 10: Removing Meningial Pixels. A \ring" that approximates the brain periphery is created by applying a 7 _ 7 erosion operation to the intracranial mask (a), resulting in image (b). Subtracting (b) from (a), creates a \ring", shown in (c). By overlaying this \ring" onto a tumor segmentation (d), small regions of meningial tissues (e) can be detected and removed. The unusual shape of the intracranial region is due to prior resection surgery. the pixels that will be removed by this operation are shown and they are indeed meningial pixels. 3.5.2 Removing Non-Tumor Regions Once any extra-cranial regions have been removed, the knowledge base is applied to dis-

criminate between regions with and without tumor based on statistical information about the region. A region mean, standard deviation, and skewness in hT1i, hPDi, and hT2i feature space respectively are used as features. The concept exploited is that trends and characteristics described at a pixel level in Table 2 and Section 3.3 are also applicable on a region level. By sorting regions in feature space based upon their mean values, rules based on their relative order can be created: 1. Large regions that contain tumor will likely contain a signi_cant number of pixels that are of highest intensity in T1 and PD space, while regions without tumor likely contain a signi_cant number of pixels of lowest intensity in T1 and PD space. 2. The means of regions with similar tissue types neighbor one another in feature space. 3. The intra-cranial region with the highest mean T1 value and a \high" PD and T2 value, is considered \First Tumor," against which all other regions are compared. 20 (a) (b) (c) (d) (e) Figure 11: Using Pixel Counts to Remove Non-Tumorous Regions. Given a re_ned tumor segmentation after Stage Three (a), spatial regions with a signi_cant number of pixels highest in T1 space (b) or PD space (c) are likely to contain tumor. Regions with pixels lowest in T1 space (d) are unlikely to contain signi_cant tumor. Ground truth is shown in (e).

4. Other regions that contain tumor are likely to fall within 1 to 1.5 standard deviations (depending on region size) of First Tumor in T1 and PD space. While most glioblastoma-multiforme tumor cases have only one tumorous spatially compact region that has the highest mean T1 value, in some cases, the tumor has grown such that it has branched into both hemispheres of the brain, causing the tumor to appear disjoint in some slices, or it has fragmented as a result of treatment. Also, di_erent tumor regions do not enhance equally. Thus, cases can range from a single well-enhancing tumor to a fragmented tumor with di_erent levels of enhancement. In comparison, the makeup of non-tumor regions is generally more consistent than in tumorous regions. Therefore, the knowledge base is designed to facilitate removal of nontumor regions because their composition can be more reliably modeled and detected. Regions that comply with the _rst heuristic listed above are the easiest to locate and their statistics can be used to examine the remaining regions. To apply the _rst heuristic, three new image masks are created. The _rst image mask takes the re_ned tumor segmentation image and keeps only 20% of the highest T1 value pixels (i.e., if there were 100 pixels in the re_ned tumor image, the 20 pixels with the highest T1 values are kept). The second mask keeps the highest 20% in PD space, while the third mask keeps the 30% lowest in T1 space. Each region is isolated and intersected with each of the 3 masks created. The

21 Table 3: Region Labeling Rules Based on Pixel Presence. Region Size Pixels in intersections with the 3 masks Action _ 5 Any Bottom T1 Pixels AND Remove Less than 2 Top T1 Pixels Non-Tumor _ 500 More than RegionSize _ 0:06 Top T1 Pixels Label As Tumor _ 5 No Top T1 Pixels AND Remove More Than RegionSize _ 0:005 Bottom T1 Pixels AND Less Than RegionSize _ 0:01 Top PD Pixels number of pixels of the region in a particular mask is recorded and compared with the rules listed in Table 3. An example is shown in Figure 11. Regions that do not activate any of the rules in Table 3 remain unlabeled and are analyzed using the last two heuristics. According to the third heuristic, given a region that has been positively labeled tumor as a point of reference, a search can be made in feature space for neighboring tumor regions. Normally, the region with the highest T1 mean value can be selected as this point of reference (called \First Tumor"). To guard against the possibility that an extra-cranial region (usually meningial tissues at the inter-hemispheric _ssure) has been selected instead, the selected region is veri_ed via the heuristic that a tumor region will not only have a very high T1 mean value, but will also occupy the highest half of all regions in sorted PD and T2 mean space. For example, if there were 10 regions total, the region being tested must be one of the 5 highest mean values in both PD and T2 space. If the candidate region

passes, it is con_rmed as First Tumor. Otherwise, it is discarded and the region with the next highest T1 mean value is selected for testing as First Tumor. Once First Tumor has been con_rmed, the search for neighboring tumor regions can begin. Although tumorous regions can have between-slice variance, the third and fourth heuristics hold for the purpose of separating tumor from nontumor regions within a given slice. Furthermore, the standard deviations in T1 and PD space of a known tumor region were found to be a useful and exible distance measure. 22 Table 4: Region Labeling Rules Based on Statistical Measurements. Largest is the largest known tumor region. (a) Rules Based on Standard Deviation (SD) of \First Tumor" Region Size If Region's Mean Values are: Action _ 10 OR More than 1 SD away in T1 space OR Remove _ Largest=4 More than 1 SD away in PD space. _ 10 AND More than 1.5 SD away in T1 space AND Remove _ Largest=4 More than 1.5 SD away in PD space. (b) Labeling Rules Based on Region Statistics _ 100 Region T1 Skewness _ 0:75 AND Remove Region PD Skewness _ 0:75 AND Region T2 Skewness _ 0:75 Table 4(a) lists the two rules that used the standard deviation to remove non-tumor regions, based on the size (number of pixels) of the region being tested. The rule in Table 4(b) serves as a tie-breaker for some regions that were not labeled before. The term Largest is used to indicate the largest known tumor region. In most cases there was only

a single tumor region, so the \_rst tumor" region was also the Largest region. In cases where tumor was fragmented, however, a larger tumorous region will provide a more robust mean and standard deviation for the distance measure. Therefore, the system would _nd Largest by searching for the largest region that was within one standard deviation in both T1 and PD space to the First Tumor region. After the rules in Table 4 are applied, all regions that were not removed are labeled as tumor, and the segmentation process terminates. 4 Results 4.1 Knowledge-Based Vs. Ground Truth A total of 120 slices, including the 17 training slices described in Section 2.1, were within the slice range of the system and known to contain tumor. After processing by the system, the slices were compared with \ground-truth" tumor segmentations that were created by radiologist hand labeling [41]. Error was found between the two segmentations, both false 23 Table 5: Comparison of Knowledge-Based Tumor Segmentation Vs. Hand Labeled Segmentation Per Volume. Patient Scan True False False Tumor Percent Corr. \True" False Positive Positive Negative Size Match Ratio Positive 1 Base 6921 2700 234 7155 0.97 0.78 80 1 R1 7038 3879 196 7234 0.97 0.70 467 1 R2 7285 4869 176 7461 0.98 0.65 496 1 R3 6206 3261 166 6372 0.97 0.72 227 1 R4 5930 3130 48 5978 0.99 0.63 47 2 Base 7892 5976 408 8300 0.95 0.54 18

2 R1 10092 3481 1059 11151 0.91 0.75 66 2 R2 14822 4961 1012 15834 0.94 0.78 219 3 Base 8917 1635 581 9498 0.94 0.85 47 3 R1 5003 2619 169 5172 0.97 0.71 89 4 Base 3054 1536 75 3129 0.98 0.73 124 4 R1 3627 2082 659 4286 0.85 0.43 1092 4 R2 2506 1020 1103 3609 0.69 0.46 495 5 Base 829 573 173 1002 0.83 0.54 161 6 Base 1425 624 0 1425 0.96 0.78 53 7 Base 177 175 0 177 1.00 0.51 54 positives (where the system indicated tumorous pixels where ground truth did not) and false negatives (where ground truth indicated tumorous pixels that the system did not). To compare how well (on a pixel level) the KB method corresponded with ground truth, two measures were used. The _rst, \percent match," is simply the number of true positives divided by the total tumor size. The second, is called a \correspondence ratio," and was created to account for the presence of false positives: Correspondence Ratio = True Pos. (0:5 _ False Pos.) Number Pixels in Ground Truth Tumor For comparing on a per volume basis, the average value for Percent Match was generated using: Average % Match = Pslices in set i=1 (% match)i _ (number ground truth pixels)i Pslices in set i=1 (number ground truth pixels)i The average value for the Correspondence Ratio is similarly generated. 24

Table 5 lists the results of the KB system on a per-volume basis. The results show that the KB system performs well overall. We note that 89 of the 120 slices had a Percent Match rating of 90% or higher. Slices that showed signi_cant False Negative presence were primarily the result of two situations. Some tumor could be lost during the intracranial extraction stage. One test slice (from Patient 4 Repeat Scan 2) had signi_cant tumor pixels lost during the morphological operations following tumor recovery from the quadrangle test. In four uppermost test slices (all from Patient 1), part of the tumor had grown beyond the intra-cranial region into an area normally occupied by surrounding meningial membranes, which have an increased percentage presence in the uppermost slices. The tumor's location within these membranes, combined with the reduced brain size complicated extraction. Other instances of tumor loss occurred when the system captured the tumor borders, but not its interior, possibly due to more subtle gadolinium enhancement (still detected by the radiologist, but not clear enough in feature space) [42], or cases where necrosis prevented circulation of the enhancing agent, but the radiologist made a conservative diagnosis and marked the area as tumor. Overall, the KB approach tended to signi_cantly overestimate the tumor volume. Only one volume in Table 5 shows underestimation (Patient 4 Repeat Scan 2), and that can

be traced to one test slice with signi_cant tumor underestimation (described above). The tendency to over-estimate is consistent with the system's paradigm, since only those pixels positively believed to be non-tumor are removed, defaulting areas of uncertainty to be labeled as tumor. To show the nature of the false positives in the knowledgebased system, an additional measurement, \true" false positives, were added to Table 5 to indicate how many of the false positives were actually not connected spatially to any ground truth tumor. This number is less than 15% of the false positives with 2 exceptions. An examination of the process of creating ground-truth images revealed a 5% interobserver variability in tumor volume [41]. We also note that all brain tumors have microin_ltration beyond the borders 25 Table 6: Comparison of kNN (k=7) Tumor Segmentation Vs. Hand Labeled Segmentation Per Volume. Patient Scan True False False Percent Corr. Positive Negative Positive Match Ratio 1 Base 6430 782 3592 0.89 0.64 1 R1 6548 781 5410 0.89 0.52 1 R2 6544 925 5032 0.88 0.54 1 R3 5643 751 5227 0.88 0.47 1 R4 5274 935 5500 0.85 0.41 2 Base 6227 2167 3287 0.74 0.55 2 R1 5933 5217 6840 0.53 0.23 2 R2 7905 8199 7498 0.49 0.26 3 Base 6972 2570 4027 0.73 0.52 3 R1 3695 1476 2903 0.71 0.43

4 Base 2191 938 1716 0.70 0.43 4 R1 2105 2193 3432 0.49 0.09 4 R2 1988 1614 2869 0.55 0.15 5 Base 874 144 1490 0.86 0.13 6 Base 319 116 1085 0.22 -0.16 7 Base 175 1 1128 0.99 -2.21 de_ned with gadolinium enhancement. This is especially true in glioblastoma-multiformes, which are the most aggressive grade of primary glioma brain tumors, and no one can tell the exact tumor borders without invasive histopathological methods [24, 42, 43] and these were unavailable. As a result, ground truth images mark the areas of tumor exhibiting the most angiogenesis (formation of blood vessels, resulting in the greatest gadolinium concentration). Therefore, the knowledge-based system may capture tumor boundaries that extend into areas showing lower degrees of angiogenesis (which would still be treated during therapy) [43]. 4.2 Knowledge-Based Vs. kNN One of the advantages of this KB approach is that human based training regions of interest (ROI's), currently required for supervised techniques [44], are not necessary after rule acquisition. Yet, results can be as good, if not better, than those obtained from supervised methods, without the need to for time-consuming ROI selection, which make such methods 26 impractical for clinical use and do not guarantee satisfactory performance. Table 6 shows how well the supervised k-nearest neighbors (kNN) algorithm (k=7) [45] performed on

the same slices processed by the KB system. The kNN method _nds the k=7 labeled pixels from the ROI's closest to a test pixel and classi_es the test pixel into the majority class of the associated ROI's. The kNN algorithm has been shown to be less sensitive to ROI selection than seed-growing, a commercially available supervised approach (ISG Technologies, Toronto, Canada) [44, 46]. It must be noted that the kNN results include extra-cranial pixels in the tumor class because kNN is applied to the whole image. No extraction of the actual tumor is done, which would require additional supervisor intervention. The kNN numbers shown here were the mean results over multiple trials of ROI selection, meaning that all kNN slice segmentations were e_ectively training slices. Furthermore, kNN introduces the question of inter and intra-observer variability, which was rated at approximately 9% and 5% respectively [47]. In contrast, the KB system was built from a small subset of the available slices and processed 103 slices in unsupervised mode with a static rule set allowing for complete repeatability. 4.3 Evaluation Over Repeat Scans Examining tumor growth/shrinkage over multiple acquisitions, the total tumor volume for ground truth, the KB method, and kNN are compared in Table 7 and Figure 12. The kNN volumes shown are means over one or more trials and include the total inter and intra-observer standard deviation. The KB system is closer to the ground truth volume

in 8 of the 16 cases, though the di_erence between the KB and kNN methods was less than the kNN standard deviation in 7 of the cases. More importantly, comparing their respective performances in Tables 5 and 6, the KB method has a smaller number of false negatives than the kNN method in all volumes compared, suggesting the KB method more closely matched ground truth than kNN. 27 Table 7: Tumor Volume Comparison (Pat. = Patient, GT = Ground Truth Volume, KB = Knowledge Based, kNN SD = kNN Standard Deviation, kNN Trial = Number of Trials, kNN Obs. = Number of kNN Observers.) Pat. Scan GT KB kNN kNN kNN kNN Volume Volume Volume SD Trials Obs. 1 Base 7155 9621 10022 732 5 2 1 R1 7234 10917 11958 2236 5 2 1 R2 7461 12154 11576 1615 5 2 1 R3 6372 9467 10870 4395 5 2 1 R4 5978 9060 10774 891 5 2 2 Base 8300 13868 9514 1635 5 2 2 R1 11151 13573 12773 2375 5 2 2 R2 15834 19783 15403 1942 5 2 3 Base 9498 10552 10999 1323 5 3 3 R1 5172 7622 6598 1830 5 3 4 Base 3129 4590 3907 643 4 2 4 R1 4286 5709 5537 592 4 2 4 R2 3609 3526 4857 727 4 2 5 Base 1002 1042 2364 N/A 1 1 6 Base 1425 2049 1404 N/A 1 1 7 Base 177 352 1303 207 4 2 Both methods showed an instance where the ground truth volume grew, yet they re-

ported tumor shrinkage. The kNN method failed to correctly predict tumor growth in Patient 1, from Repeat Scan 1 to 2. Since the kNN volumes are based on multiple trials, it is di_cult to assign a speci_c cause. The KB method failed to predict tumor growth in Patient 2, from the Baseline scan to Repeat Scan 1. According to pathology reports, the Baseline scan contained a signi_cant amount of uid, possibly hemmorage, which arti_cially brightened regions surrounding the tumor in the PD scan and made the border between non-tumor and tumor pixels unusually di_use. This distorted the histogram from which the initial tumor segmentation was based, resulting in signi_cant overestimation of tumor volume. In Repeat Scan 1, however, not only had the uid disappeared, but pathology reports noted a slight decrease in gadolinium enhancement. Thus, the initial 28 Tumor Volume (voxels) Scanning Session Knowledge Based Versus Hand Labeling GT KB KNN 13000 01234 5000 6000 7000 8000 9000 10000

11000 12000 (a) Patient 1 Tumor Volume (voxels) Scanning Session Knowledge Based Versus Hand Labeling GT KNN KB 8000 10000 12000 14000 16000 18000 20000 012 (b) Patient 2 01 Tumor Volume (voxels) Scanning Session Knowledge Based Versus Hand Labeling KB KNN GT 5000 6000 7000 8000 9000 10000 11000 (c) Patient 3 012 Tumor Volume (voxels)

Scanning Session Knowledge Based Versus Hand Labeling GT KNN KB 3000 3500 4000 4500 5000 5500 6000 (d) Patient 4 Figure 12: Tracking Tumor Growth/Shrinkage Over Repeat Scans. KB=Knowledge-Based System. kNN=k-Nearest Neighbors. GT=Ground Truth. overestimation followed by the decreased gadolinium enhancement caused the trend to appear to be tumor shrinkage instead of growth. Patient 2 had received signi_cant treatment (surgery and radiation therapy) prior to scanning, making the tumor boundaries particularly di_cult to detect. In fact, a review of the pathology reports showed that radiologist estimations of the tumor volume had to be revised. Finally, Figure 13 shows examples of the KB system's correspondence to hand-labeled tumor in slices. Figures 13(a-c) show a worst case segmentation, while (d-f) and (g-i) show an average and best case segmentation respectively. All three examples are from the test 29 (a) Raw Image (b) KB Tumor (c) GT Tumor (d) Raw Image (e) KB Tumor (f) GT Tumor (g) Raw Image (h) KB Tumor (i) GT Tumor

Figure 13: Comparison of Knowledge-Based Tumor Segmentation Vs. Ground Truth. Worst case (a-c), average case (d-f), and best case (g-i). set. 5 Discussion We have described a knowledge-based multi-spectral analysis tool that segments and labels glioblastoma-multiforme tumor. The guidance of the knowledge base gives this system additional power and exibility by allowing unsupervised segmentation and classi_cation decisions to be made through iterative/successive re_nement. This is in contrast to most other multi-spectral e_orts such as [8, 10, 12] which attempt to segment the entire brain image in one step, based on either statistical or (un)supervised classi_cation methods. 30 The knowledge base was initially built with a general set of heuristics comparing the e_ects of di_erent pulse sequences on di_erent types of tissues, as shown in Table 2. This process is called \knowledge-engineering" as we had to decide which knowledge was most useful for the goal of tumor segmentation, followed by the process of implementing such information into a rule-based system. More importantly, the training set used was quite small - seventeen slices over three patients. Yet, the system performed well. A larger training set would most likely allow new and more e_ective trends and characteristics to be revealed. Thresholds used to handle a certain subset of the training set could be better generalized.

The slices processed had a relatively large thickness of 5mm. Thinner slices which exhibit a reduced partial-volume e_ect and allow better tissue contrast. While relying on feature space distributions, the system was developed using general tissue characteristics, such as those listed in Table 2, and relative relationships between tissues to avoid dependence upon speci_c feature-domain values. The particular slices were acquired with the same parameters, but gadolinium-enhancement has been found to be generally very robust in di_erent protocols and thickness [48, 39]. Should acquisition parameter dependence become an issue, given a large enough training base across multiple parameters, the knowledge base could automatically adjust to a slice's speci_c parameters since such information is easily included when processing starts. The patient volumes processed had received various degrees of treatment, including surgery, radiation and chemo-therapy both before and between scans. Yet, despite the changes these treatments can cause, such as demyelinization of white matter, no modi_cations to the knowledge based system were necessary. Other approaches, like neural networks [49] or any sort of supervised method which is based on a speci_c set of training examples could have di_culties in dealing with slightly di_erent imaging protocols and the e_ects of treatment. As stated in the introduction, no method of quantitating tumor volumes is widely ac-

cepted and used in clinical practice [4]. A method by the Eastern Cooperative Oncology 31 group [5] approximates tumor area in the single MR slice with the largest contiguous, well-de_ned tumor evident. The longest tumor diameter is multiplied by its perpendicular to yield an area. Changes greater than 25% in the area of a tumor over time are used, in conjunction with visual observations, to classify tumor response to treatment into _ve categories from complete response (no measurable tumor left) to progression. This approach does not address full tumor volume, depends on the exact boundary choices, and the shape of the tumor [2, 5]. By itself, the approach can lead to inaccurate growth/shrinkage decisions [6]. The promise of the knowledge-based system as a useful tool is demonstrated by the successful performance of the system on the processed slices. The _nal KB segmentations compare well with radiologist-labeled \ground truth" images. The knowledge-based system also compared well with supervised kNN method, and was able to segment tumor without the need for (multiple) human-based ROI's or postprocessing, which make kNN clinically impractical. Further, we looked at removing extracranial pixels from kNN tumor segmentations and found that kNN then consistently underestimated the tumor size. Also with the extra-cranial pixels removed kNN makes 2 mistakes in following the trend shown in Figure 12 (a).

Future work includes addressing the problems noted in Section 4 to improve the system's performance. The high number of false positives, which appear to be a matter of tumor boundaries, can be reduced by applying a _nal threshold in T1-space (the feature image used primarily by radiologists in determining _nal tumor boundaries). Our primary concern was losing as little ground truth tumor as possible. Expanding the training set to include more patients should expand the generalizability of the knowledge base. The next expected development in this system is to expand the processing range to all slices that intersect the brain cerebrum. Introducing new tumor types, such as lower grade gliomas will also be considered, as will complete labeling of all remaining tissues. Also, newer MRI systems may provide additional features, such as di_usion images or edge strength 32 to estimate tumor boundaries, which can be readily included into the knowledge base. The knowledge-base also allows straightforward expansion as new tools are found e_ective (perhaps edge detection on the tumor mask). In conclusion, the knowledge-based system is a multispectral tool that shows promise in e_ectively segmenting glioblastoma-multiforme tumors without the need for human supervision. It has the potential of being a useful tool for segmenting tumor for therapy planning, and tracking tumor response. Lastly, the knowledge-based paradigm allows easy

integration of new domain information and processing tools into the existing system when other types of pathology and MR data are considered.

Automated Segmentation of MR Images of Brain Tumors1


Computer-assisted surgical planning and advanced image-guided technology have become increasingly used in neurosurgery (15). The availability of accurate anatomic three-dimensional (3D) models substantially improves spatial information concerning the relationships of critical structures (eg, functionally significant cortical areas, vascular structures) and disease (3,4,6). In daily clinical practice, however, commercially available intraoperative navigational systems provide the surgeon with only twodimensional (2D) cross sections of the intensity-value images and a 3D model of the skin. The main limiting factor in the routine use of 3D models to identify (segment) important structures is the amount of time and effort that a trained operator must spend on the preparation of the data (3,6). The development of automated segmentation methods has the potential substantially reduce the time for this process and to make such methods practical. Although 2D images accurately depict the size and location of anatomic objects, the process of generating 3D views to visualize structural information and spatial anatomic relationships is a difficult task, which is usually carried out in the clinicians mind. Image-processing tools provide the surgeon with interactively displayed 3D visual information that is somewhat similar to the view of the surgeon during surgery; the use of these tools facilitates comprehension of the entire anatomy. For example, the (mental) 3D visualization of structures that do not readily align with the

planes of the images (eg, the vascular tree) is difficult if it is based on 2D images alone. Image-based modeling requires the use of computerized imageprocessing methods, which include segmentation, registration, and display. Segmentation with statistical classification techniques (7,8) has been successfully applied to gross tissue type identification. Because the acquisition of tissue parameters is insufficient for successful segmentation due to the lack of contrast between normal and pathologic tissue (9,10), statistical classification may not allow differentiation between nonenhancing tumor and normal tissue (1113). Explicit anatomic information derived from a digital atlas has been used to identify normal anatomic structures (1416). We developed an automated segmentation tool that can be used to identify the skin surface, ventricles, brain, and tumor in patients with brain neoplasms (17,18). The purpose of the current study was to compare the accuracy and reproducibility of this automated method with those of manual segmentation carried out by trained personnel.

ImagingProtocol The heads of patients were imaged in the sagittal and transverse planes with a 1.5-T magnetic resonance (MR) imaging system (Signa; GE Medical Systems, Milwaukee, Wis) and a contrast materialenhanced 3D sagittal spoiled gradient-recalled acquisition with contiguous sections (flip angle, 45; repetition time msec/echo time msec, 35/7; field of view, 240 mm; section thickness, 1.5 mm; matrix, 256 x 256 x 124). The acquired MR images were transferred to a Unix network via an Ethernet connection. BrainTumorPatients Twenty patients were selected from a neurosurgical database of

images in approximately 260 patients with brain tumors. Cases of the 260 patients had been postprocessed for image-guided neurosurgery by using a combination of semiautomated techniques and manual outlining of the skin surface, brain, ventricles, vessels, and tumor. Two neurosurgeons (including A.N.) were asked to select 20 cases with meningiomas and low-grade gliomas of different sizes, shapes, and locations to provide a representative selection. These two types were selected because they are relatively homogeneous and have well-defined imaging characteristics. Pathologic diagnoses included six meningiomas (cases 13, 11, 12, 16), and 14 low-grade gliomas (cases 410, 1315, 1720). In this study, six of six meningiomas were well enhancing, and 14 of 14 low-grade gliomas were nonenhancing. Cases 110 formed the development database used for the design and validation of the automated segmentation method. To ensure that the method produced correct results when applied to cases other than those of the development database, validation was carried out separately with the validation data sets from cases 11 20 in addition to validation with the 10 development cases. Automated Segmentation of Brain and Tumor General segmentation framework.We adopted a general algorithm called adaptive templatemoderated classification (see references 17 and 18 and the Appendix for details). The technique involves the iteration of statistical classification to assign labels to tissue types and nonlinear registration to align (register) a digital anatomic atlas (presegmented anatomic map) to the patient data (Fig 1). Statistical classification was used to divide an image into different tissue classes on the basis of the signal intensity value. If different tissue classes have the same or overlapping grey-value distributions (eg, cerebrospinal fluid and fluid within the eyeballs), such methods fail. Therefore, additional information about the

spatial location of anatomic structures was derived from a registered anatomic atlas (manually segmented MR image of a single subject) (6). Objects of interest were identified on the classified images with local segmentation operations (mathematic morphology and region growing) (19).

Application to tumor segmentation.For the task of brain tumor segmentation, the order in which the structures of interest were segmented followed a simple hierarchical model of anatomy (Fig 2). By proceeding hierarchically from the outside to the inside of the head, each segmented structure defined a refined region of interest for the next structure to be segmented. Five different tissue classes were modeled: background, skin (fat and bone), brain, ventricles, and tumor. Because of the homogeneous tissue composition of meningiomas and low-grade gliomas, one tissue class was sufficient for the statistical model.

An atlas of normal anatomy does not include pathologic structures. As a result, templates from the atlas were derived for only the head, brain, and ventricles. First, the whole head was segmented from the background by using thresholding and local segmentation strategies. On the basis of the segmentation of the head, an initial alignment of the atlas to the patient was established. Next, the intracranial cavity (ICC) was segmented from the head in two segmentation iterations (statistical classification, local segmentation strategy, and reregistration of the atlas). At this point, all voxels belonging to the brain, ventricles, and tumor were labeled as ICC. In the first iteration, the ICC was segmented by using the head and ICC template from the initially registered atlas. The atlas was then realigned on the basis of the whole head and ICC of the patient. This step was followed by a second classification and local segmentation step. The ventricles were segmented from the ICC in a third segmentation iteration. At this point, the ICC contained only voxels belonging to the brain and tumor.

Having defined a region of interest for the tumor, which was located inside the brain and outside the ventricles and skin (fat and bone), the tumor was segmented in two iteration cycles. In the first iteration, the tumor was classified by using the anatomic knowledge from only the atlas; this step was followed by application of the local segmentation strategy. Because there was no tumor template in the atlas, a straightforward registration was not possible. Consequently, tumor voxels were relabeled as ICC voxels prior to the registration process. As a result, a spatial correspondence between the atlas and patient data set was established for every voxel, since the patient data set contained no voxels labeled as tumor at the time registration of the atlas was carried out. In the second iteration, tumor segmentation from the first iteration was used as an anatomic template. Although this template was approximate, the additional information about the location of the tumor prevented misclassification of voxels distant to the atlas template as tumor. Initialization of the automated segmentation method.To reduce noise on the MR image without blurring object edges, an anisotropic diffusionfiltering method was applied (20). For the initialization of the automated segmentation method, a graphical user interface was developed for the 2D display of MR imaging sections and the selection of example tissue points with use of a mouse (Fig 3). The only interaction required by the operator (see Validation Experiments) was the selection of three to four example points for each tissue class, that is, skin (fat and bone), brain, ventricles, and tumor. The program calculated a statistical model for the distribution of the gray values on the basis of these manually selected tissue prototypes.

Figure 3. Graphical user interface for the automated segmentation method to allow the 2D display of MR sections and the selection of example tissue points with a mouse. Manual Segmentation of Brain and Brain Tumor For manual segmentation of the brain and tumor, an interactive segmentation tool was used (MRX; GE Medical Systems) on an Ultra 10 workstation (Sun Microsystems, Mountain View, Calif). Human operators outlined the structures section by section (see Validation Experiments) by pointing and clicking with a mouse. The program connected consecutive points with lines. An anatomic object was defined by a closed contour, and the program labeled every voxel of the enclosed volume. Validation Experiments Because of the lack of an acceptable standard (eg, realistic phantom) for comparison, our definition of a segmentation standard was based on the manual segmentations with interactive computer segmentation tools. However, manual segmentation is

subject to interobserver variability and human error (6). To minimize the influence of these factors while maintaining a means of measuring the segmentation accuracy of the individual raters, the standard was defined on the basis of the segmentations of four independent human observers. A single 2D section was randomly selected from the subset of the MR imaging volume that showed the tumor. The four human observers then independently outlined the brain and tumor on this section by hand. The standard segmentation of brain and tumor in each patient data set was defined as the area of those voxels in which at least three of four raters agreed regarding their identification. All other voxels were labeled as background. To assess accuracy, the automated segmentation tool was trained once with a single MR imaging section containing all tissue types of interest and was executed on the full 3D data set. This process resulted in segmentation of the entire data set. For each data set, the structures skin (fat and bone), brain, ventricles, and tumor were segmented. The interrater variability of the four independent manual and the four independent automated segmentations was measured on the basis of all 20 cases. For the measurement of intraobserver variability, one of the medical experts also manually segmented the selected 2D section four times during 1 week in each of the 20 cases. Training of the automated method was also carried out four times during 1 week in all 20 cases. During all experiments, the times for manual outlining, training, and computation for the automated segmentation method were recorded. Statistical Analysis Qualitative analysis was carried out on the basis of volume-ofoverlap comparison with standard (accuracy) and overall volume variability (reproducibility) in the 2D section selected.

Segmentation accuracy was defined as the percentage of correctly classified voxels (in object and background) with respect to the total number of voxels V on the image, that is, (TP + TN)/V, where TP is the number of true-positive voxels and TN is the number of true-negative voxels (21). The mean and SDs of the accuracy values with respect to the 20 test cases were also calculated (Matlab version 4.1; Mathworks, Cambridge, Mass). To assess the inter- and intrarater variability error, the coefficient of variation CV% was calculated as follows: CV % = 100(SDvolume/Meanvolume). The coefficient of variation does not measure the correctness of segmentation, only the change in the volume of objects in segmentations of different raters.

Figure 4. Example of a spoiled gradient-recalled image. A, Meningioma. B, Manual segmentation, C, Statistical classification. D, Template-moderated segmentation

Figure 5. Example of manual and automated segmentation of a low-grade glioma. A, On a spoiled gradient-recalled image. B, With manual segmentation. C, With template-moderated segmentation Segmentation accuracy with the automated method was high and within the range of accuracy of the manual method. The overall mean accuracy for tumor segmentation for all 20 cases was 99.68% 0.29% (SD) with the automated method and 99.68% 0.24% with the manual method (Fig 6), while the mean accuracy for brain segmentation for all 20 cases was 98.40% 0.57% and 98.81% 0.88%, respectively (Fig 7).

Figure 6. Brain segmentation accuracy of the manual (mean, minimum, and maximum) and automated methods (ATmC) in the 20 brain tumor cases (meningioma cases, 1-3, 11, 12, 16; lowgrade glioma cases, 7-10, 13-15, 17-20). Accuracy with the automated method was consistent with that of manual segmentation in most cases.

Figure 7. Tumor segmentation accuracy of the manual (mean, minimum, and maximum) and automated methods (ATmC) in the 20 brain tumor cases (meningioma cases, 1-3, 11, 12, 16; lowgrade glioma cases, 7-10, 13-15, 17-20). Accuracy with the automated method was consistent with that of manual segmentation in most cases.

Intraobserver variability (coefficients of variation) for both the automated and manual methods was low. For brain and tumor segmentation, mean intraobserver variability for all 20 cases with the automated method was 0.10% 3.57% and 0.14%4.70%, while the manual method had coefficient of variation values of 0.24%4.11% and 0.80%3.28% Interobserver variability was lower with the automated method than with the manual method. Mean interobserver variability for all 20 cases with the automated method was 0.33%4.72% and 0.99%6.11% for brain and tumor segmentation, respectively, while the manual method achieved coefficient of variation values of 2.62%10.51% and 3.58%14.42% (Table). Automated segmentation of a complete 3D image volume required approximately 75 minutes of unsupervised computation time (Sun ES 6000 server, 20 central processing units with 250-MHz speed and 5 Gbyte of random-access memory; Sun Microsystems). The overall operator time for training of the automated method was approximately 510 minutes (selection of example voxels for each of the relevant tissue classes). Manual outlining of brain and tumor required 13 minutes per section. Time for manual segmentation of the 3D volume was on the order of 35 hours.

Discussion:
Our findings show that brain, meningiomas, and low-grade gliomas can be accurately and reproducibly segmented by means of automated processing of gradient-echo MR images. We have shown that our algorithm allows complete segmentation of the brain and tumor and requires only the manual selection of a small sample of example voxels (2128 voxels).

The goals of the development of automated segmentation tools are to make segmentation of MR images more practical by replacing manual outlining, which reduces operator time, without a measurable effect on the results and to improve reproducibility. However, the validity of our segmentations is difficult to assess without the availability of a standard. Therefore, our validation study was designed to determine how closely the raters agreed within a single method (automated and manual) and how closely the segmentation results correlated between the two methods. Segmentation accuracy with the automated method was high and within (maximum difference, 0.6%) the accuracy range of the manual method. The errors with automated brain segmentation were in part due to over- and undersegmentation in the area of the tentorium and the lateral sulcus with abundant vessels. The algorithm tended to cause oversegmentation in these areas if parts of the neck near the cerebellum were misclassified as brain and if the ICC template derived from the atlas was misaligned. The size of the structure affects segmentation accuracy. Segmentation errors occur on the boundary of surfaces. Thus, the larger the surface of an object, the more voxels on the entire image that can potentially be misclassified. Therefore, accuracy is lower with larger objects than with smaller objects. Reproducibility was higher with the automated method because only the selection of a few example points is required, not decision making for every voxel on the image during manual segmentation. The reproducibility of brain and tumor segmentation was high. Nevertheless, the inter- and intraobserver reproducibility of both methods was higher with the brain than with the tumor. Larger objects tend to have a volumetric reproducibility that is higher than the overall segmentation accuracy. Because the surface-to-volume ratio behaves approximately like 1/r (where r is the object radius), the disagreement about voxel classes on the surface of larger

objects with respect to the overall volume is less consequential than it is with smaller objects. Interobserver variability was substantially reduced with the automated method. Manual interobserver variability was particularly high for low-grade gliomas, which were more difficult to segment, causing deviating expert opinions. Automated segmentation is more robust to expert variation because it involves only the selection of typical example points for training the algorithm, while manual segmentation requires a human decision for every boundary voxel, which is difficult due to, for example, partial voluming. However, intraobserver variability was improved only with meningioma segmentation. For low-grade gliomas, manual intraobserver variability is substantially lower than interobserver variability because the execution of manual segmentation varies, but the opinion regarding the shape of the tumor does not. Therefore, compared with manual segmentation, automated segmentation does not reduce interobserver variability substantially. Reproducibility was higher with meningiomas with both methods. This findings can be explained by comparing the gray-value distributions of the meningiomas or low-grade gliomas with that of the brain. The meningioma tissue class partially overlaps parts of the skin, the fat in the neck, and the straight and superior sagittal sinuses, and it was well distinguishable from brain tissue with the application of a contrast agent. When the region of interest was restricted to the ICC, the tissue that showed signal intensity overlap with the meningioma was excluded, and the meningioma was successfully segmented. In some cases of low-grade glioma, the ICC may not have been a sufficient ROI for accurate tumor segmentation due to the similar signal intensities of the tumor and surrounding gray matter. False classifications cannot be corrected if the brain misclassified as

tumor tissue is adjacent to the tumor boundary (oversegmentation) or vice versa (undersegmentation). The incorporation of T2weighted images, which clearly distinguish the tumor as hyperintense tissue, may enable the precise definition of the tumor boundaries. If the voxels of the brain misclassified as tumor are distant to the tumor boundary, if they are connected to the tumor by only thin structures, or if tumor voxels inside the tumor are falsely classified as brain, false classifications can be corrected. The algorithm developed in this work is based on template-driven segmentation in which an anatomic atlas is used to guide a statistical classification process (8,14,17,18,23). Clark et al (27) proposed a method for automatic detection and segmentation of glioblastoma multiforme on a combination of T1-, T2-, and intermediate-weighted MR images with use of classification and an anatomic knowledge database; accuracy was greater than 90%. Bonnie et al (24) recently reported results with use of an interactive tumor segmentation method. However, its value is difficult to assess because no detail on the segmentation technique is given. Approaches based on MR imaging data alone with use of active contours (25) or multispectral classification (12,13) work well if the tumor shows sufficient contrast to the brain. However, active contours require good initialization, which is difficult to automate, while multispectral classification reveals problems with overlapping intensity distributions. The lack of automated segmentation methods results in tedious manual labor. This result has been one of the reasons why 3D models have been typically limited to university research settings. The reduction in operator time (35 hours to 510 minutes) makes it practical to consider the integration of computerized segmentation into daily clinical practice for presurgical 3D planning and intraoperative navigation in routine neurosurgical procedures. A technician carries out the initial work, and a radiologist verifies the result is verified while softreading the

images. Our software is currently used on a powerful computer; systems such as ours are becoming increasingly affordable (26). In conclusion, accurate segmentation is possible for meningiomas and low-grade gliomas with our automated method. Further work is required to extend the tools to a broader range of brain tumors (eg, glioblastoma multiforme). Future clinical studies on the accuracy and reproducibility of our technique in a larger population will be necessary to determine its practical use in a clinical setting.

APPENDIX:

In the following, we give the parameter settings and features used. (For algorithmic details, see references 17 and 18.) The following parameter settings were used: anisotropic diffusion filtering, two iterations; dt = 0.2; = 5.2; kNN classification k, 5; number of classes C, five; affine registration, nine degrees of freedom; image resolution levels, three; distance transform saturation distance, 100; nonlinear registration, three resolution levels; window size w, 9 x 9 x 9; morphologic operators, spherical element; size, 7 x 7 x 7; region growing, connectivity of 18. Four classification-registration iterations were used for ICC segmentation, one iteration was used for ventricle segmentation, and two iterations were used for tumor segmentation. The brain and ventricle are also resegmented during tumor segmentation. For segmentation of normal structures (ie, skin, fat, and bone; brain; ventricles), the pattern used in this work was vi = [v1i, ..., v5i]T, where i is the index to voxel location xi. The elements vji result from image processing operations Tj as follows: v1i = T1[I(xi)], where T1 is anisotropic diffusion filtering ; v2i = T2[A(xi)], where T2

is the distance transform of skin, fat, and bone; v3i = T3[A(xi)], where T3 is the distance transform of the background of skin, fat, and bone; v4i = T4[A(xi)], where T4 is the distance transform of brain; and v5i = T5[A(xi)], where T5 is the distance transform of the background of the brain. The elements are applied to the MR image I(xi) or the image of the registered anatomic atlas A(xi). While T1 is carried out only during the preprocessing stage, the operators T2 to T5 are applied to the reregistered atlas in every segmentation iteration cycle. For the first tumor segmentation cycle, the patterns are also vi = [v1i, ..., v5i]T. For the second tumor segmentation cycle, the patterns are vi = [v1i, ..., v6i]T, where v1i for i = 15 are defined as above but with the additional pattern v6i = T6[A(xi)], T6: distance transform of initial tumor segmentation where Ib is the resultant image of the first tumor segmentation.

Automated Segmentation of MR Images of Brain Tumors:


An automated brain tumor segmentation method was developed and validated against manual segmentation with three-dimensional magnetic resonance images in 20 patients with meningiomas and low-grade gliomas. The automated method (operator time,

510 minutes) allowed rapid identification of brain and tumor tissue with an accuracy and reproducibility comparable to those of manual segmentation (operator time, 35 hours), making automated segmentation practical for low-grade gliomas and meningiomas. Computer-assisted surgical planning and advanced image-guided technology have become increasingly used in neurosurgery (15). The availability of accurate anatomic three-dimensional (3D) models substantially improves spatial information concerning the relationships of critical structures (eg, functionally significant cortical areas, vascular structures) and disease (3,4,6). In daily clinical practice, however, commercially available intraoperative navigational systems provide the surgeon with only two-dimensional (2D) cross sections of the intensity-value images and a 3D model of the skin. The main limiting factor in the routine use of 3D models to identify (segment) important structures is the amount of time and effort that a trained operator must spend on the preparation of the data (3,6). The development of automated segmentation methods has the potential substantially reduce the time for this process and to make such methods practical. Although 2D images accurately depict the size and location of anatomic objects, the process of generating 3D views to

visualize structural information and spatial anatomic relationships is a difficult task, which is usually carried out in the clinicians mind. Image-processing tools provide the surgeon with interactively displayed 3D visual information that is somewhat similar to the view of the surgeon during surgery; the use of these tools facilitates comprehension of the entire anatomy. For example, the (mental) 3D visualization of structures that do not readily align with the planes of the images (eg, the vascular tree) is difficult if it is based on 2D images alone. Image-based modeling requires the use of computerized image-processing methods, which include segmentation, registration, and display. Segmentation with statistical classification techniques (7,8) has been successfully applied to gross tissue type identification. Because the acquisition of tissue parameters is insufficient for successful segmentation due to the lack of contrast between normal and pathologic tissue (9,10), statistical classification may not allow differentiation between nonenhancing tumor and normal tissue (1113). Explicit anatomic information derived from a digital atlas has been used to identify normal anatomic structures (1416). We developed an automated segmentation tool that can be used to identify the skin surface, ventricles, brain, and

tumor in patients with brain neoplasms (17,18). The purpose of the current study was to compare the accuracy and reproducibility of this automated method with those of manual segmentation carried out by trained personnel. Materials and Methods Imaging Protocol The heads of patients were imaged in the sagittal and transverse planes with a 1.5-T magnetic resonance (MR) imaging system (Signa; GE Medical Systems, Milwaukee, Wis) and a contrast material enhanced 3D sagittal spoiled gradient-recalled acquisition with contiguous sections (flip angle, 45; repetition time msec/echo time msec, 35/7; field of view, 240 mm; section thickness, 1.5 mm; matrix, 256 3 256 3 124). The acquired MR images were transferred to a Unix network via an Ethernet connection. 586 Brain Tumor Patients Twenty patients were selected from a neurosurgical database of images in approximately 260 patients with brain tumors. Cases of the 260 patients had been postprocessed for image-guided neurosurgery by using a combination of semiautomated techniques and manual outlining of the skin surface, brain, ventricles, vessels, and tumor. Two neurosurgeons (including A.N.) were asked to select 20 cases with meningiomas

and low-grade gliomas of different sizes, shapes, and locations to provide a representative selection. These two types were selected because they are relatively homogeneous and have well-defined imaging characteristics. Pathologic diagnoses included six meningiomas (cases 13, 11, 12, 16), and 14 low-grade gliomas (cases 410, 1315, 1720). In this study, six of six meningiomas were well enhancing, and 14 of 14 low-grade gliomas were nonenhancing. Cases 110 formed the development database used for the design and validation of the automated segmentation method. To ensure that the method produced correct results when applied to cases other than those of the development database, validation was carried out separately with the validation data sets from cases 1120 in addition to validation with the 10 development cases. Automated Segmentation of Brain and Tumor General segmentation framework.We adopted a general algorithm called adaptive templatemoderated classification (see references 17 and 18 and the Appendix for details). The technique involves the iteration of statistical classification to assign labels to tissue types and nonlinear registration to align (register) a digital anatomic atlas (presegmented anatomic map) to the patient data (Fig 1). Statistical

classification was used to divide an image into different tissue classes on the basis of the signal intensity value. If different tissue classes have the same or overlapping grey-value distributions (eg, cerebrospinal fluid and fluid within the eyeballs), such methods fail. Therefore, additional information about the spatial location of anatomic structures was derived from a registered anatomic atlas (manually segmented MR image of a single subject) (6). Objects of interest were identified on the classified images with local segmentation operations (mathematic morphology and region growing) (19). Application to tumor segmentation.For the task of brain tumor segmentation, the order in which the structures of interest were segmented followed a simple hierarchical model of anatomy (Fig 2). By proceeding hierarchically from the outside to the inside of the head, each segmented structure defined a refined region of interest for the next structure to be segmented. Five different tissue classes were modeled: background, skin (fat and bone), brain, ventricles, and tumor. Because of the homogeneous tissue composition of meningiomas and low-grade gliomas, one tissue class was sufficient for the statistical model. An atlas of normal anatomy does not include pathologic structures. As a result,

templates from the atlas were derived for only the head, brain, and ventricles. First, the whole head was segmented from the background by using thresholding and local segmentation strategies. On the basis of the segmentation of the head, an initial alignment of the atlas to the patient was established. Next, the intracranial cavity (ICC) was segmented from the head in two segmentation iterations (statistical classification, local segmentation strategy, and reregistration of the atlas). At this point, all voxels belonging to the brain, ventricles, and tumor were labeled as ICC. In the first iteration, the ICC was segmented by using the head and ICC template from the initially registered atlas. The atlas was then realigned on the basis of the whole head and ICC of the patient. This step was followed by a second classification and local segmentation step. The ventricles were segmented from the ICC in a third segmentation iteration. At this point, the ICC contained only voxels belonging to the brain and tumor. Having defined a region of interest for the tumor, which was located inside the brain and outside the ventricles and skin (fat and bone), the tumor was segmented in two iteration cycles. In the first iteration, the tumor was classified by using the anatomic knowledge from only the atlas; this step was followed by application

of the local segmentation strategy. Because there was no tumor template in the atlas, a straightforward registration was not possible. Consequently, tumor voxels were relabeled as ICC voxels prior to the registration process. As a result, a spatial correspondence between the atlas and patient data set was established for every voxel, since the patient data set contained no voxels labeled as tumor at the time registration of the atlas was carried out. In the second iteration, tumor segmentation from the first iteration was used as an anatomic template. Although this template was approximate, the additional information about the location of the tumor prevented misclassification of voxels distant to the atlas template as tumor. Initialization of the automated segmentation method.To reduce noise on the MR image without blurring object edges, an anisotropic diffusionfiltering method was applied (20). For the initialization of the automated segmentation method, a graphical user interface was developed for the 2D display of MR imaging sections and the selection of example tissue points with use of a mouse (Fig 3). The only interaction required by the operator (see Validation Experiments) was the selection of three to four example points for each tissue class, that is, skin (fat and bone), brain, ventricles, and tumor. The

program calculated a statistical model for the distribution of the gray values on the basis of these manually selected tissue prototypes. Manual Segmentation of Brain and Brain Tumor For manual segmentation of the brain and tumor, an interactive segmentation tool was used (MRX; GE Medical Systems) on an Ultra 10 workstation (Sun Microsystems, Mountain View, Calif). Human operators outlined the structures section by section (see Validation Experiments) by pointing and clicking with a mouse. The program connected consecutive points with lines. An anatomic object was defined by a closed contour, and the program labeled every voxel of the enclosed volume. Validation Experiments Because of the lack of an acceptable standard (eg, realistic phantom) for comparison, our definition of a segmentation standard was based on the manual segmentations with interactive computer Figure 1. Diagram of the tumor segmentation scheme. Volume 218 z Number 2 Automated Segmentation of MR Images of Brain Tumors z 587 segmentation tools. However, manual segmentation is subject to interobserver variability and human error (6). To minimize the influence of these factors while

maintaining a means of measuring the segmentation accuracy of the individual raters, the standard was defined on the basis of the segmentations of four independent human observers. A single 2D section was randomly selected from the subset of the MR imaging volume that showed the tumor. The four human observers then independently outlined the brain and tumor on this section by hand. The standard segmentation of brain and tumor in each patient data set was defined as the area of those voxels in which at least three of four raters agreed regarding their identification. All other voxels were labeled as background. To assess accuracy, the automated segmentation tool was trained once with a single MR imaging section containing all tissue types of interest and was executed on the full 3D data set. This process resulted in segmentation of the entire data set. For each data set, the structures skin (fat and bone), brain, ventricles, and tumor were segmented. The interrater variability of the four independent manual and the four independent automated segmentations was measured on the basis of all 20 cases. For the measurement of intraobserver variability, one of the medical experts also manually segmented the selected 2D section four times during 1 week in each of the 20 cases. Training of the automated

method was also carried out four times during 1 week in all 20 cases. Figure 2. Diagram of the hierarchical segmentation method, which proceeds from A to D. Figure 3. Graphical user interface for the automated segmentation method to allow the 2D display of MR sections and the selection of example tissue points with a mouse. Figure 4. Example of a spoiled gradient-recalled image. A, Meningioma. B, Manual segmentation, C, Statistical classification. D, Templatemoderated segmentation. 588 z Radiology z February 2001 Kaus et al During all experiments, the times for manual outlining, training, and computation for the automated segmentation method were recorded. Statistical Analysis Qualitative analysis was carried out on the basis of volume-of-overlap comparison with standard (accuracy) and overall volume variability (reproducibility) in the 2D section selected. Segmentation accuracy was defined as the percentage of correctly classified voxels (in object and background) with respect to the total number of voxels V on the image, that is, (TP 1 TN)/V, where TP is the number of true-positive voxels and TN is the number of true-negative voxels (21). The mean and SDs of the accuracy values

with respect to the 20 test cases were also calculated (Matlab version 4.1; Mathworks, Cambridge, Mass). To assess the inter- and intrarater variability error, the coefficient of variation CV% was calculated as follows: CV% 5 100(SDvolume/Meanvolume). The coefficient of variation does not measure the correctness of segmentation, only the change in the volume of objects in segmentations of different raters.

You might also like