You are on page 1of 36

Statistical Methods in Medical Research 2000; 9: 359–394

Data mining in brain imaging


Vasileios Megalooikonomou, James Ford, Li Shen, Fillia Makedon Department of
Computer Science, Dartmouth Experimental Visualization Laboratory, Dartmouth College,
Hanover, New Hampshire, USA and Andrew Saykin Brain Imaging Laboratory, Departments
of Psychiatry and Radiology, Dartmouth Medical School, Dartmouth Hitchcock Medical Center,
Lebanon, New Hampshire, USA

Data mining in brain imaging is proving to be an effective methodology for disease prognosis and
prevention. This, together with the rapid accumulation of massive heterogeneous data sets, motivates the
need for efficient methods that filter, clarify, assess, correlate and cluster brain-related information. Here,
we present data mining methods that have been or could be employed in the analysis of brain images.
These methods address two types of brain imaging data: structural and functional. We introduce
statistical methods that aid the discovery of interesting associations and patterns between brain images
and other clinical data. We consider several applications of these methods, such as the analysis of task-
activation, lesion-deficit, and structure morphological variability; the development of probabilistic atlases;
and tumour analysis. We include examples of applications to real brain data. Several data mining issues,
such as that of method validation or verification, are also discussed.

1 Introduction

Data mining in brain imaging is an emerging field of high importance for providing
prognosis, treatment, and a deeper understanding of how the brain functions. The
field of data mining addresses the question of how best to use this data to discover new
knowledge and improve the process of decision making. The discovery of associations
between human brain structures and functions (i.e. human brain mapping) has been
recognized as the main goal of the Human Brain Project,1 which is a high-priority
project funded by several government initiatives. Mining problems can be grouped in
three categories:2 identifying classifications, finding sequential patterns, and
discovering associations. Although data mining is a powerful knowledge discovery
technique, there are constraints in the way it can be applied: it is application-
dependent, different applications usually require different mining techniques, and
data must be of a certain size and format.3 In this paper we survey current mining
methods, give a critical review of the main computational obstacles that lie behind our
ability to perform automatic data mining on brain imaging and propose some
solutions.
There are various problems in mining of brain images that need to be addressed.
The first problem is that most fundamental mining algorithms (rule-based learning
systems, neural networks, decision trees, Bayesian networks, logistic regressions, and
so on), which have been used with great success in medicine, assume that data sets
contain only simple numeric and symbolic entries. It is important, therefore, to

Address for correspondence: V Megalooikonomou, Department of Computer and Information Sciences, Temple
University, 314 Wachman Hall, Philadelphia, PA 19122, USA. E-mail: vasilis@cis.temple.edu

Ó Arnold 2000 0962-2802(00)SM221RA


360 V Megalooikonomou et al.

consider how to preprocess brain images (multidimensional arrays of data) so that we


can transform them to data representations which are amenable to data mining
techniques. A second problem is that, although there are algorithms for classifying
images, there is a lack of effective algorithms for learning from images directly.4
Again, this implies the use of methods that transform images to a format conducive for
learning algorithms. Most early medical analysis ignored the image or raw sensor
portions of the medical record or summarized them in a very simplified form (e.g.
‘normal’ or ‘abnormal’). A third problem in mining brain images is the heterogeneity
of the brain imaging data: different modalities, formats and resolutions prevent a
common analysis and require integration. Integrating data from different studies often
means integrating different formats, which, in turn, implies imposing several
assumptions on the data representation. Many studies today, especially those using
functional imaging, only focus on a specific clinical question and deal only with a
small set of subjects, mainly due to the high cost of acquiring image data.
Other difficulties inherent in mining of associations in brain images are that: (1)
due to inter-subject variation and noise, a large number of subjects have to be studied;
(2) functions may correspond to more than one location and can relocate in the
presence of structure abnormalities; (3) brain lesions and other abnormalities have a
complex spatial distribution, typically covering multiple brain structures; and (4)
normal brain function can be affected in varying ways due to the complexity of the
functional organization of the human brain.
These obstacles aside, there have been recent technological advances that make
available enormous amounts of data. Imaging studies of the human brain at active
medical institutions today routinely accumulate more than 5 terabytes of clinical data
per year. Data in this domain usually consist of three-dimensional (3-D) images from
different medical imaging modalities that capture structural (e.g. MRI,y CT,z
histologyx ) and functional/physiological (e.g. PETjj , fMRI,{ SPECTyy ) information
about the human brain. On the other hand, there is now a wide availability of non-
invasive methods for assessing macroscopic brain structure, particularly magnetic
resonance (MR) techniques that complement clinical functional assessment.5 Also,
there is continued development of improved functional imaging techniques and
normalization methods. Greater computer capabilities are leading to the creation of
large databases of structure/function information, the efficiency of which, depends on
interoperable multimedia data representation that is easy to search. This trend is
reflected in the work of Fox et al.,6–9 Evans et al.,10,11 the QBISM database by Arya et
al.,12 the BRAID database by Letovsky et al.,13,14 the BrainMap database of neuro-
imaging data,15 and other neuroimaging databases.16,17
The problem of multidimensional data (e.g. brain images), can be solved with newer
mining methods which are applied directly to the images in order to capture most of
their information content. As was mentioned, mining is heavily dependent on
statistical methods for discovering associations and classifications among disparate
yMagnetic resonance imaging: shows soft-tissue structural information.
zComputed tomography: shows hard-tissue structural information.
xHistology images are acquired by physically slicing and photographing tissue.
jjPositron emission tomography: shows physiological activity.
{Functional-magnetic resonance imaging: shows physiological activity.
yySingle photon emission computed tomography: shows physiological activity.
Data mining in brain imaging 361

types of data. We exploit this fact and consider methods that combine the information
from image and behavioural data about the brain and present methods for developing
probabilistic brain atlases. The results of these methods are new representations of the
information content of brain images and statistical maps.
The rest of this paper is organized as follows: in Section 2 we present the
preprocessing phase of data mining in brain imaging, including the segmentation and
spatial normalization of images. Although smoothing and reduction of noise can be
considered to be part of preprocessing, here we include them as part of the mining
methods in Section 3. In Section 3 we present mining methods that have been used in
structural and functional imaging. These methods are useful for: (a) the efficient
discovery of associations between structures and functions; (b) the classification of
structural information, including both normal structures and abnormalities such as
tumours; and (c) recently, the discovery of associations between gene expressions,
morphology, and function. In Section 4 we present important issues in mining of brain
images that are common to structural and functional brain data. We also consider the
problem of verification of mining methods. This review concludes with a discussion in
Section 5.

2 Data preprocessing

After image data is collected for each subject and before mining is performed, the data
has to pass through a preprocessing phase. This phase identifies and normalizes the
brain objects to be stored in a database and mined later. In anatomically related
studies, after an anatomical (i.e. structural) image is collected for each subject, each
lesion, area, or structure of interest is delineated (segmented) as a region of interest
(ROI) on each slice automatically, semi-automatically, or manually. Extensive work
has been done on automating image segmentation.18–21 The first segmentation
methods were solely intensity based.22 However, since many structures were not
distinguishable on the basis of signal intensity alone, prior spatial information was
incorporated either in the form of intensity gradients,23 prior spatial probability
distribution over signal intensity,18,23,24 or by registering the image to a segmented
atlas25–27 using a spatial probability map for each voxel.28
Using the slice data, each ROI is reconstructed in three dimensions. Then, in both
functional and anatomical images, normalization or image registration has to be
performed to make image data comparable across subjects without morphological and
acquisition variability. This process maps homologous anatomical regions to the same
location in a stereotaxic space, such as the Talairach anatomical atlas.29 Several linear
and nonlinear spatial transformations have been developed to bring the 3-D atlas and
the subject’s 3-D image into register, i.e. spatial coincidence.27,30–32 As an example, the
effect of registration of an MR image to the Talairach atlas using a nonlinear method
based on a 3-D elastically deformable model32 is presented in Figure 1.
In addition to 3-D image data, modern brain image databases usually contain a set
of generic anatomic atlases of the human brain that model the exact shapes and
positions of anatomical structures. A raw MR or fMR image does not identify the
structure to which each voxel (3-D volume element) belongs, but an anatomical atlas
362 V Megalooikonomou et al.

(a) (b) (c)


Figure 1 A slice of (a) original MR image, (b) atlas, and (c) atlas image overlaid on the deformed MR image.
Picture from Megalooikonomou et al.97

can supply this information (with the accuracy of the registration methods) when
overlaid on the image (see Figure 1). A variety of brain structure maps have been
derived, at several spatial scales, from 3-D tomographic images,33 anatomic speci-
mens,29,34,35 and a variety of histologic preparations that reveal regional cyto-
architecture36 and molecular content. Other brain maps have concentrated on
function,37,38 or neuronal connectivity and circuitry.39,40 Here, for clarity of
presentation, we concentrate on the widely used Talairach atlas.

3 Data mining

We first present methods for discovering associations between brain functions and
structures. Traditionally, two approaches have been employed for functional brain
mapping. The first approach seeks associations between lesioned structures and con-
comitant neurological or neuropsychological deficits – for example, between trauma
lesions and deficits like left visual field deficit. The second approach measures brain
activation in subjects as they are asked to perform certain tasks. We present methods
for both approaches. The problem of efficiently finding similar ROIs in brain image
databases is also addressed. We present methods for extracting knowledge about the
morphological variability of brain structures and about abnormalities such as
tumours. Methods that can be potentially applied to both structural and functional
imaging are also presented.

3.1 Functional imaging of the brain


Functional brain imaging uses technologies such as PET, SPECT or fMRI with
specifically designed experiments to identify activated regions of the brain under
different conditions. The cumulative nature of PET greatly limits its spatial and
temporal resolution. fMRI has much better temporal resolution, and a spatial
resolution on the order of 2–4 mm that is limited by the characteristics of the
underlying vascular structure (Ogawa et al.41 present an in depth review of fMRI).
Recently, diffusion tensor imaging (DTI) has emerged as a new application of MR
technology to the problem of human brain mapping. DTI calculates 3-D diffusion
Data mining in brain imaging 363

tensor maps by measuring water proton mobility in diffusion weighted MR images.42


DTI can be used to indirectly image nerve fibre bundles, especially those in white
matter areas that are connecting links between various (grey matter) brain areas and
‘invisible’ so far in other imaging.
The data from functional imaging scans are typically in the form of measurements at
thousands of voxels. In PET, a voxel measurement reflects the amount of activity in a
brain region. In fMRI, a voxel’s time series of measurements is meaningless by itself,
but becomes useful when compared to another series of measurements at the same
place under different conditions. This is possible after registration and motion cor-
rection.43–46 Since different subjects may have different strategies for accomplishing
the same task, it can be useful to be able to average all subjects in a multi-subject study
to see the common areas of activation.45 Averaging multiple subjects also increases the
statistical power of the analysis, and is usually necessary for fMRI with its low signal-
to-noise ratio. Smoothing is usually then performed by applying a spatial smoothing
filter to reduce the effect of motion that was not completely removed, and other un-
wanted noise.47 The typical choice is a low-pass Gaussian filter in the spatial domain,
which smooths high frequency variation in the data. Several researchers have suggested
that the use of Gaussian filters for denoising in the spatial domain introduces unwanted
biases,48–53 and that the optimal filter width is a function of the size of activation
foci,50,54 which cannot generally be determined a priori except possibly by reference to
underlying neuroanatomy. The low-pass filter approach will also blur and displace
activated areas and remove the areas of least activation, reducing spatial resolution.
There have been suggestions for addressing these limitations of Gaussian filtering
by focusing on signal restoration instead of noise removal, looking at multiple scales to
overcome the problem of scale-specificity, or using approaches that do not impose a
priori models on the data. The scale-space approach50,55,56 builds a multi-resolution
representation of functional data. Voxels are organized in clusters (called ‘blobs’50,55)
that may be of different shapes and sizes at different scales. Blobs themselves are
hierarchically organized, with ‘blob trees’ dividing scales higher in the tree (lower
resolutions) into smaller blobs at lower scales, down to individual voxels at scale 0.
Each scale s > 0 corresponds to the convolution of the original n-D signal with a
Gaussian kernel of width s.50 The multifiltering approach57 is similar in that analysis
is done on smoothed and unsmoothed data (although only one level of smoothing is
used) for the purpose of picking up large regions of relatively weak activations. Other
analysis techniques operate directly on the data, for example in the wavelet
domain,58–60 and do not explicitly smooth the data. In this case, the fact that back-
ground noise is distributed compared to spatially localized signals can aid in useful
extraction of the signals. The complex-denoising noise reduction process can also
increase signal-to-noise by thresholding discrete wavelet transform coefficients.61
Temporal smoothing is also often done in fMRI, and the signal series at each voxel is
usually convolved with some approximation of the haemodynamic response function
(HRF) – e.g. a Poisson function, the Gamma function, a linear combination of the
Gamma function and its temporal derivatives, or the Gaussian function – that is
chosen heuristically.62 This approach identifies activated voxels, although particular
analysis methods can take many forms. There has been recent interest in improving
HRFs.62,63
364 V Megalooikonomou et al.
3.1.1 Statistical parametric mapping (SPM)
One of the most common analysis approaches currently in use, called statistical
parametric mapping (SPM),64,65 analyses each voxel’s changes independently of the
others and builds a map of statistic values for each voxel (see Figure 2). The
significance of each voxel can be ascertained statistically with a Student’s t-test, an F-
test, a correlation coefficient, or any other univariate statistical parametric test43 (more
details about the use of t-test in SPM are in appendix A1). The result of the t-test is a t-
value that, when indexed with the number of degrees of freedom, gives a probability
value indicating how likely it is that this difference could occur by chance. The
significance threshold (alpha value) is typically chosen to be 0.05 or less, indicating a
5% or smaller chance of a ‘false positive’ decision of significance. The t-value can also
be expressed as a Z-score, indicating how many standard deviations the t-value
represents on a Gaussian distribution. Z-scores and alpha values are the most popular
means for reporting the significance of this difference. The t-test is mathematically
equivalent to a one-way analysis of variance (ANOVA), and both can be expressed in
the general linear model for regression analysis64
Y ðtÞ ¼ X ðtÞ þ "ðtÞ
at each voxel. The general linear model finds (and tests significance of) linear
regressions between two data sets X and Y for each member of X (including dummy
members added to represent experimental conditions). Methods based on general
linear models have more variables (voxels) than samples (e.g. the number of
examinations) and thus are severely underconstrained.
By applying a uniform threshold of statistical significance it is possible to determine
which voxels are likely to have had significant changes between conditions, and with
what likelihood, assuming a null hypothesis of no changes between conditions. Data
from individuals can be combined into groups, in which case statistical inferences can
be drawn either about the significance of differences between subjects, or the
likelihood of consistent activation in the population from which the subjects were
drawn.66 Some early studies have examined the possibility of doing post-analysis on
statistical maps, allowing comparisons of activations between independently analysed

Figure 2 Statistical parametric map. These displays of a statistical parametric map (SPM), produced using the
SPM99 software package, show two views (one overlaid on a surface map, the other an axial slice) of 3D brain
activations for a simple motor task (courtesy Brain Imaging Lab, Dartmouth-Hitchcock Medical Center).
Data mining in brain imaging 365

subjects.55,67 One problem here is that when using voxel data, comparison of different
scans may not align activations that are slightly misregistered. Recently, the spatial
extent, i.e. the area of activation, has been used to detect significant regions of
activation. If a spatial extent of activation is reported, however, it is sometimes only
given as a voxel count above a threshold, which has been found to be a very unstable
indicator of activation across trials.68
Another issue of concern in functional imaging is the consistency of observations
across subjects. It is common for some activations during a study to be similar across
subjects (and thus significantly correlated with the task under study), and for others to
be idiosyncratic. There has been some report in the literature about the difficulty of
reproducing voxel level significance maps in fMRI.68,69 In the case of Tegeler,69 even
the 2% most significant voxels were found to vary considerably across runs, subjects,
and analysis techniques. Variability in activation maps is a concern; however,
reproducibility of fMRI activations at a regional level has been found to be good in
general across sites, subjects, and techniques,70 and comparisons of signal changes
rather than significance values may address problems in voxel-level comparisons.68
A third issue in functional imaging is the choice of analysis methods for generating
activation data. Several reports have substantiated the difference in activations
observed using different analysis techniques.69,71,72 These may be a concern in a large
database system. However, if results are tagged according to the analysis technique
and parameters used to generate them, multiple analysis methods may actually be
beneficial in supporting a ‘pluralistic’ strategy for analysis.72
A fourth issue in functional imaging is the necessity of limiting assumptions in
analysis. For example, the SPM model assumes a uniform distribution of noise with
covariance between voxels estimated by a Gaussian distribution that decays with
increasing distance, and unfortunately not all data conforms well to this noise model
(noise from unmodelled biological variability such as venous activity may not, for
example). It is difficult to assess the extent that conditions deviate from the model in
practice,52 but tests with synthetic data have demonstrated large biases in the false
positive rate73 as a result of low frequency physiological fluctuations and a variation of
signal-to-noise ratio with imaging rate. Techniques for correcting long-term changes
in mean signal intensity can improve the sensitivity of statistical analysis in this
particular case.74

3.1.2 Other methods


An analysis method similar in concept to statistical parametric mapping is
correlation of observed signal changes with experimental blocks.75–79 This method
can be applied at the voxel level, with a resulting activation map much like an SPM
(but not necessarily statistical in nature). Other variations analyse groups of voxels
with an overall mean change in signal,80 incorporate prior information81 or wavelet
analysis82 into an SPM-type framework, use expectation maximization to estimate
labelling parameters in a Markov random field model,83 or combine tests for
significance with detection of high intensity signals.84
Another approach used in the analysis of brain activations is to analyse relations
between voxels, by using, for example, a six-dimensional correlation map correlating
every voxel with each other voxel in order to detect correlated changes85 or by using
366 V Megalooikonomou et al.

structural equation modelling to relate functional neuroimaging signals to underlying


neurobiological activities.86 Similarly, one can use an empirical assessment of the
relation between data in random pairs of conditions in place of statistical tests of
significance87 although this is computationally intensive. Other approaches are based
on the generation of cross-correlation image maps,88 Fourier-analysis-based time-
series regression models,89 principal component analysis,90 partial least squares,76
independent component analysis,91 or structural equations to model functional
connectivity among ROIs.92 Spatial-lattice models are often applied to image analysis;
these techniques are often based on Markov random fields, with inference techniques
based on various modifications of likelihood-maximization procedures.93
Another alternative is to divide brain images into meshes, treat functions as classes
and meshes as attributes, and find rules like ‘A and B ) positive’, which means if
mesh A and mesh B are active, then some function is positive/on:94 thus the problem
can be reduced to supervised inductive learning. The usual inductive learning
algorithms such as C4.595 do not work for this problem, however, because (1) there are
strong correlations between attributes and (2) there are usually too many attributes
(say 100  100 = 10 000) and too few samples (say 100). However, nonparametric
regression can be applied to solve these problems. One algorithm for the discovery of
rules from brain images consists of the following two steps: (1) a nonparametric
regression96 is applied on the training data set. The results are linear formulas of the
form y ¼ p1 x1 þ . . . þ pn xn þ pnþ1 , where y is a dependent variable (i.e. function), and
x1 ; . . . ; xn are Boolean independent variables (i.e. grids). (2) Rules are extracted from
the linear formula y (y is normalized to [0,1]) by converting it to a Boolean function y0
using an approximation: if y  0:5, then y0 ¼ 1; otherwise, y0 ¼ 0. Since the naive
approach always runs exponentially to complete the second step and becomes un-
realistic in practice, a better algorithm is to generate terms from low order to high
order while applying a pruning strategy. Experiments on artificial data showed:94
(1) when there are no correlations between adjacent attributes, the accuracies are
almost the same as the accuracies of C4.5, and (2) when there are strong correlations
between adjacent attributes, the algorithm works better than C4.5 in terms of the
accuracy of the result.

3.2 Structural imaging of the brain

3.2.1 Lesion-deficit analysis


In principle, lesion-deficit data could be analysed using methods similar to those for
activation studies (e.g. SPM). We now present several additional statistical methods to
determine structure–function relations through the study of lesions and associated
deficits. After the preprocessing, i.e. the segmentation of lesions and registration of the
binary images to a common standard, the binary images consist of ‘normal’ and
‘abnormal’ (lesioned) voxels. This type of structural image data, combined with the
behavioural variables, form the data for each subject. Mining methods for the
discovery of the structure–function associations from this data can operate on a
resolution range from the spatially distinct structures of an anatomical atlas (atlas-
based analysis) to the voxel level (voxel-based analysis).97
Data mining in brain imaging 367

3.2.1.1 Atlas-based analysis. In the case where anatomical structures represent


functional units, the atlas-based analysis is more sensitive than voxel-based analysis
since the atlas provides significant prior knowledge. The first step in the atlas-based
analysis is to calculate for each structure si and subject pj , the fraction of lesioned
volume, fsi ;pj , which is defined as the volume of the lesioned part of si divided by the
volume of si . These fractions form the continuous structural variables. Here, we
present methods for both categorical and continuous structural variables.13,14,97 In the
case of categorical variables, the lesioned fraction determines if a structure is lesioned
(abnormal) or not. For example, a patient might be treated as having a lesion in a
structure if the intersection of all his/her lesions with the structure is at least one
voxel. To eliminate thresholding effects, the atlas structures can also be analysed as
continuous variables, considering for each one the fraction that is lesioned.
When the search for the model that explains the data can be directed through
specific hypotheses or prior knowledge, the situation is easier. Hypotheses can be
formed after using explorative visualization or other methods, and can be tested using
statistical analysis. An example where visualization helps to reduce the search space
for a model is shown in Figure 3. If there is little preconception about the
relationships between the variables, all the possibilities may have to be explored. This
exploratory, or data mining, analysis is presented below. There are two analysis
approaches one can follow: bivariate (pairwise) and multivariate.

Bivariate analysis. Let F be the number of functional and S be the number of


structural variables respectively. In the case of categorical structural variables, F  S

ADHD+

ADHD-

Tal-113 Tal-116 Tal-119 Tal-124


Figure 3 Visualization helps reducing the search space for the model that explains the data. Sum of lesions for
the ADHD+ and the ADHDÿ group of patients (four slices of the Talairach atlas are shown for each group). The
right putamen and left thalamus are highlighted. Picture from Megalooikonomou et al.181
368 V Megalooikonomou et al.

two-way contingency tables are constructed and for each the Fisher exact test98 is
computed. The associations between structures and deficits are sorted in order of the
p-values returned from the exact tests, and the ones with the lowest p-values are
reported. For the same type of analysis with continuous structural variables one can
use the Mann–Whitney test and logistic regression analysis (the Mann–Whitney stati-
stic is appropriate because the distributions of the fractions of lesioned volumes are
not Gaussian). In exploratory analysis of either categorical or continuous structural
variables, computing a statistic for many pairwise tests leads to the multiple com-
parison problem, i.e. the situation where a certain undesirably high number of the
tests are expected to be positive by chance (see Section 4.1 for a more complete
treatment of the multiple comparison problem).

Multivariate analysis. Multivariate analysis may find complex multivariate associa-


tions not found by multiple uses of bivariate statistics. For example, consider a deficit
that is associated with two structures and appears only when both of them are lesioned.
Multivariate analysis is free of the multiple comparison problem, since it evaluates an
entire model with one statistic. One multivariate extension of the chi-square test for
categorical variables is log-linear analysis; logistic regression is another multivariate
method that can be used to relate the log-odds of having a particular deficit to the
fraction of lesioned structures. The stepwise logistic regression has also been used,97
where the algorithm for discovering the model that explains the interactions starts
with no associations and a greedy approach is applied to add (or delete) associations
based on their relative strength.

3.2.1.2 Voxel-based analysis. Atlas-based analysis results are only as good as the atlas
that is being used. Instead of imposing any high level structure on the image data, one
can analyse them on a voxel-by-voxel basis. Voxels are typically labelled as either
normal or abnormal, and thus structural variables are in this case categorical. Given
that the number of voxels that are considered is typically on the order of 107 , a like
number (i.e. 107 ) of Fisher exact tests have to be performed for each of the functional
variables that are examined. This procedure can be seen as clustering the voxels by
functional association.97 The calculation of the contingency table and the Fisher exact
test is computationally intensive, and the multiple-comparison problem is also severe
due to the large number of tests that must be performed. However, in this case it can
be attacked with clustering analysis since false positives will not tend to cluster.
Voxel-based regression analysis can also be used to determine whether voxels in a
certain region are associated with a functional variable. One can construct a regression
equation that relates lesions in a sphere of a given radius and centre to a deficit, and
the ‘causal brain region’ in which lesions are most strongly associated with that deficit
can then be identified.13,97 Let l be a lesion, o a sphere, vðrÞ the volume of a region r,
and iðr1 , r2 Þ the intersection of two regions r1 and r2 . Then the identification of the
causal region is done by calculating the optimal centre and radius for the logistic
regression equation:
log itðdÞ ¼ log ðoddsd Þ ¼ afs þ b
Data mining in brain imaging 369

where oddsd ¼ pd ðÞ=ð1 ÿ pd ðÞÞ, pd ðÞ is the probability of having a certain deficit d,
fs ¼ vðiðl; oÞÞ=vðoÞ is the fraction of the sphere that is lesioned, a=(log odds of d)/
(lesioned fraction of sphere volume), and b is the prior log odds of deficit d.
Given the centre ðx; y; zÞ and the radius r of the sphere, one can find values for the
parameters a and b such that the sum of squares of residuals is minimized. The goal is
to optimize the sphere parameters ðx; y; z; rÞ to obtain the best fit of the data to a
regression line. The solution is the sphere that best discriminates between lesions that
are and are not associated with deficit d. This nonlinear optimization procedure is
computationally intensive, and cannot describe multifocal functional associations.

3.2.1.3 Results from mining lesion-deficit associations. In this section we present results
from the mining process in BRAID.13,14 The Brain Image Database includes images
and clinical information from over 700 subjects from two different studies: the
Cardiovascular Health Study (CHS)99 and the Frontal Lobe Injury in Children
(FLIC) study.100 Visualization applied prior to the analysis procedure can help direct
the analysis by choosing certain structures of the anatomical atlas to examine further
using the statistical tests. Figure 3 shows the sum of lesions over all subjects that did
and did not develop ADHD (Attention-Deficit Hyperactivity Disorder), i.e. ADHD+
and ADHDÿ, respectively. Based on these images and on previous research
implicating a frontal lobe-basal ganglia-thalamic pathway, the right putamen and
the left thalamus (highlighted in Figure 3)y were chosen for further analysis using the
Fisher exact test for categorical and the Mann–Whitney test for continuous structural
variables. The p-values in Table 1 confirm a strong association between lesions in the
two structures and development of ADHD.
Running an exploratory analysis on the CHS data set (300 subjects) using the chi-
square test to evaluate two-way contingency tables for all pairwise combinations of
atlas structures (90) and functional variables (14) returns a list (sorted by p-value) of
structure–function associations.13,97 The five most significant associations are
presented in Table 2. Highly significant lesion-deficit associations detected by BRAID,
such as visual field deficit and lesions in contralateral orbital or cuneate gyrus, are also
consistent with current clinical knowledge.101 The incorrect association between the
left hippocampus and a right visual field deficit is due to registration error, since the
hippocampus is next to the optic radiations that are very well known to be correlated
with a visual field deficit.
Preliminary stepwise logistic regression analysis using continuous structural
variables from the FLIC data set show similar results for the development of ADHD.
This method identifies the left SupCerebellarA (which is lateral to the left putamen
area) as a strong predictor. Results from a preliminary voxel-based analysis for the
ADHD variable of the FLIC data set are presented in Figure 4(a). Each voxel
represents the p-value for the association between the voxel being lesioned and the
development of ADHD. A 3-D reconstruction is shown in Figure 4(b). These results

yDue to the compromised connections between the frontal lobe and these two structures, it is believed that the frontal
lobe is not able to exert its normal oversight function to suppress impulsive urges and behaviours. A common
behavioural pattern in patients with ADHD is impulsivity and lack of self-control.
370 V Megalooikonomou et al.

p-value

(a)

(b)
Figure 4 Voxel-based analysis for development of ADHD. Six slices (a) of the Talairach atlas and a colour bar
that shows the correspondence between p-values and colour values are shown. (b) Visualization of the voxel-
based analysis p-value volume for development of ADHD. The higher the intensity the lower the p-value. Picture
from Megalooikonomou et al.97

are consistent with those of the atlas-based analysis. Figure 5 shows one representative
slice (119) of the Talairach atlas for the voxel-based regression analysis for ADHD.
These results are consistent with all the previous ones for ADHD.
Data mining in brain imaging 371
Table 1 Visualization directed mining. Statistical analysis of selected Talairach atlas structures for association
with ADHD (FLIC data set)97

Structure Fisher’s exact p-value Mann–Whitney p-value

R putamen 0.065 0.033


L thalamus 0.095 0.093

Table 2 Explorative analysis The five most significant structure-function associations given by the chi-square
analysis on the CHS data set97

Structure Function Chi-square p-value S-Bonf. Correct. p-value

R globus pallid. R hemiparesis 0.00001 0.0039


L hippocampus R visual defect 0.00001 0.0095
R gyri angular L pronat. drift 0.00002 0.0195
R gyri orbital L visual defect 0.00003 0.0225
R gyri cuneus L visual defect 0.00003 0.0224

3.2.2 Structure morphology analysis


Several methods have been applied in extracting knowledge about the morphology
variability of brain structures. Study of the location, size, surface area, volume, and
shape of specific brain regions is critical for discovering normal brain organization, for
defining anatomically-driven search areas for brain activity in functional imaging
(PET, fMRI) scans, and for investigating pathological changes in the case of diseases
affecting these structures. Some of the same voxel-based analysis techniques described
in relation to functional studies have been applied to anatomy as well; in general,
voxel-based morphometry identifies changes in gray matter on a voxel-by-voxel
basis.102–105,186 This method is used to study the different composition of brain tissue
after macroscopic shape differences are discounted using spatial normalization.
Another common approach is to use a warping (deformation) of an individual’s
brain to an anatomical template (e.g. the Talairach atlas29) and gathers details about
the warping that are used in the analysis.106,107 A deformation function dðu; vÞ, defined
at each point ðu; vÞ of the atlas structure, S; of interest, measures the enlargement or
shrinkage associated with the transformation from an infinitesimal region around a
point in the atlas space to its corresponding infinitesimal region in the subject space.
In this method a comparison of two different brains or, more generally, two popu-
lations is achieved by comparing the corresponding deformation fields: regions with
statistically significant differences are regions of morphological differences between
the two populations. Results from applying this methodology106 to a study of the
corpus callosum for a small group of elderly subjects are shown in Figure 6. More
details on the use of a deformation function in the analysis of morphological varia-
bility can be found in Appendix A2.
Surface-based mesh modelling is a similar approach.108,109 After minimal regis-
tration a parametric mesh is stretched over the surface contour of a structure or ROI
372 V Megalooikonomou et al.

(a) (b) (c)


Figure 5 The optimal regression sphere (c) that best discriminates the two groups, i.e. between lesions that
are (a) and are not (b) associated with the development of ADHD. Picture from Megalooikonomou et al.97

(see Figure 7). It is then compared to an average parametric mesh that is formed by
calculating the mean and variation between corresponding points on the mesh.
Finally, displacement vectors are generated for each individual structure. A local
profile of change in structures in certain conditions can be provided through colour-
coded topographic maps (see Figure 8). This method first aligns each brain volume
using distance scaling to control for head size differences, allowing for inter-individual
and group comparisons. A strategy for creating a population-based brain atlas using

(a)

(b) (c)

Figure 6 Morphological variability of the corpus callosum between women and men for a group of elderly
subjects. The posterior part (in white) was found to be significantly larger in women than in men (a). The
average shape of the corpus callosum for (b) men and (c) women in a study by Davatzikos et al.106 Pictures
from Davatzikos et al.106
Data mining in brain imaging 373

Figure 7 Extracting meshes (a) to create a cortical surface database, to search for differences where the
deformation is regarded as an observation from a random vector field. Variability is calculated based on 3D
displacement maps, which locally encode the amount of deformation required (b) to drive each subject’s gyral
pattern into exact correspondence with the average cortex for the group. Pictures from Thompson et al.184

volumetric warps is shown in Figure 9. The application of these methods in several


studies has already revealed differences in the shape and size of certain structures
related to gender (e.g. corpus callosum106,110), in disorders such as schizophre-
nia,107,111,112 in normal aging, and in Alzheimer’s disease.113 Probabilistic atlas
approaches have been used for studying both normal and abnormal brains.114
Another approach attempts to identify and register landmark configurations (defined
as point sets that correspond biologically across images).115 Image deformation
algorithms designed to accomplish these goals are useful for identifying and measur-
ing variations in structure, although they are not designed for tasks like finding
tumours or activations. The Procrustes distance is one of the core tools of image
deformation algorithms, and is calculated for two landmark configurations with the
same landmarks by minimizing the sum of distances between corresponding landmark
points while rotating around the normalized centroid of each. Finally, point-wise t-
tests, ANOVAs, and partial correlations116 as well as eigenvector and related
analysis115,117–119 have been used in computational neuroanatomy to study group
differences in morphology and its associations with cognitive variables.
Other related work is the study of human anatomy,120,121 which presents the most
difficult challenges to the understanding of typicality and variablity. While biological
shapes are highly structured, they are not rigid. Miller’s group have been using
Grenander’s deformable anatomical templates for the representation of typicality and
variability. For this, complex anatomical templates (human and macaque brains) are
annotated with coordinate systems defined within them. High-dimensional vector
fields applied to these coordinate systems carry the templates with all of its geometry
into the target. This allows for understanding modulo individual variation.

3.2.2.1 Morphological analysis of tree-like structures. Another tool for the analysis of
brain structure and function is through the morphological characterization of
neurological brain structures. Tree-like structures, such as nerve-fibre tracing in
374 V Megalooikonomou et al.

(a) (b)

Figure 8 Three-dimensional visualizations of structural variability, asymmetry and group-specific differences.


(a) Anatomical variability of the cerebral cortex in male schizophrenia patients and controls. Variability is shown
on an average surface representation of the cortex derived from schizophrenia (left) and normal control (right)
populations. Individual variations in brain structure in frontal association areas are greater in schizophrenia.
Variability is calculated based on 3D displacement maps, which locally encode the amount of deformation
required to drive each subject’s gyral pattern into exact correspondence with the average cortex for the group.
Picture from Narr et al.182 (b) Ventricle variability maps for Alzheimer’s disease. Pictures from Thompson
et al.183

Figure 9 Creating a population-based brain atlas to quantify local structural variations. A family of high-
dimensional volumetric warps relating a new 3D MRI scan to each normal scan in a brain image database is
calculated (I–II, above). The resulting warps encode the distribution in stereotaxic space of anatomic points that
correspond across a normal population (III), and their dispersion is used to determine the likelihood (IV) of local
regions of the new subject’s anatomy being in their actual configuration. Colour-coded topographic maps
highlight regional patterns of deformity in the anatomy of the new subject. Abnormal structural patterns are
quantified locally, and mapped in three dimensions. Pictures from Thompson et al.185
Data mining in brain imaging 375

DTI MR angiography or confocal microscopy, are registered (after segmentation and


skeletonization) with standard structural and functional volumes. In addition,
morphological analysis of these structures using various path analyses tools is per-
formed. Morphological descriptors such as Sholl analysis,122 moment analysis,123–125
and fractal dimension analysis are used to support content-based retrieval operations
in 3D cell-centred neuronal databases.126,127 Recently, visual data mining techniques
combined with computational neural modelling have developed a very effective means
to detect morphological influences on neuronal function.128

3.2.2.2 Brain tumour analysis and classification. Classification is an important problem


in data mining. Classifiers are useful for building taxonomies of images and sub-
sequently performing image context based searches.129 Methods for finding similar
tumour shapes in structural images130 can also be used for brain tumours. Korn et al.
use concepts from mathematical morphology, namely the ‘pattern spectrum’ of a
shape, to map each shape to a point in n-dimensional space. Starting from a natural
similarity function (the ‘maximum morphological distance’), they first prove a lower
bound for it and then demonstrate how to search efficiently for nearest neighbours in
large collections of tumour-like shapes using R-trees131 and the ‘Feature index’ (F-
index) approach.132 The technique was applied to realistic tumour shapes generated
using an established tumour-growth model133 and the results were very encouraging
(see Figure 10). Fractal features and texture analysis have also been used for the
quantitative description and recognition of brain tumours in 3-D MR images.129

3.3 Combined structural and functional imaging of the brain


Structural and functional imaging are often combined. It is common to restrict
activation studies to a certain area of interest that corresponds to an anatomical
structure. Here, we present a new area of research where both structural and
functional images have to be mined together, and methods that can be potentially
applied to both.

3.3.1 Gene expression, morphology and function


Discovering patterns of gene expression and their complex interaction with brain
morphology and function is a fundamental goal in recent molecular biology and
neurobiology studies. In situ hybridization and MRI have provided very high
resolution images of gene expressions in animal models. In addition, gene expression
brain atlases for the mouse and the rat have started to appear.134–136 After the
registration of anatomic and gene expression images across modalities and subjects
through more involved methods137–141 than those presented in Section 2, spatial
statistics methods have to be applied to find associations between anatomic, genetic,
and nonimage variables such as behavioural measures, response to drugs, or onset of
disease. The main challenge in finding associations among patterns of gene
expressions and phenotype is the synthesis of temporal information, spatial
information, and static data. Similar work has been done in the analysis of functional
images (as described earlier) where changes in signal intensity occur in response to
376 V Megalooikonomou et al.

Figure 10 Query tumour images (left column) and their nearest neighbours, with respect to morphological
distance. Picture from Korn et al.130

processing different kinds of stimuli. However, considering that multiple genes can be
expressed in the same brain location, and that the time sequence of gene expression
may also be important, makes the problem even more challenging.

3.3.2 Bayesian networks


Multivariate analysis methods like log-linear regression and logistic regression
provide relatively simple methods for generating candidate models, usually relying on
modifications of greedy search and making assumptions about cell frequencies or total
number of samples that may not hold for rare cases. A more promising approach
generates models called Bayesian networks142 that consist of graphical structures
along with statistical independence models. This method scores each model M, and
returns the most probable model that could have generated the data D at hand (i.e. the
multivariate multinomial distribution that generated D).143,144
Briefly, a Bayesian network is a directed acyclic graph in which nodes represent
variables of interest, such as structures or functions, and edges represent associations
among these variables. Each node has a conditional-probability table that quantifies
the strength of the associations between that node and its parents. Given the prior
probabilities for the root nodes and conditional probabilities for other nodes, we can
derive all joint probabilities145 over these variables. An approach for generating a
Bayesian network from data is described in Appendix A3.
Recently the Minimum Description Length (MDL) principle has been applied to
Bayesian network learning.146,147 The principle states that the best model of a
collection of data is the one that minimizes the sum of the encoding lengths of the data
and the model itself.148 The MDL metric is defined to measure the total description
length DL of a network structure G, which is the sum of description lengths of each
node.147,149 The description length of each node is defined from two components, the
network description length and the data description length. The first is the description
length for encoding the network structure, which measures the simplicity of the
network. The second is the description length for encoding the data, which measures
the accuracy of the network.
Data mining in brain imaging 377
3.3.3 Behavioural imaging
Another approach for modelling structure–function relationships is to transform
neuropsychological test scores that assess cognitive functions to a 3-D spatial
representation of the predicted sites of regional dysfunction. Gur et al.150 presented
such an algorithm for display and analysis of neuropsychological test scores that
produces regional values from standardized (z-transformed) neuropsychological test
scores using the formula:
X  X 
Bj ¼ W ði; jÞSi = W ði; jÞ

where Bj is the index of behavioural functioning for a given region, W ði; jÞ is the
weight assigned to the jth brain region for the ith behavioural score, and Si is the test
score. The method was demonstrated on a sample of hemi-Parkinson patients151 and
later used to examine the sensitivity of cognitive test scores to lesions in specific ROIs,
inter-expert agreement, and intra-expert reliability.152 The method can be used to
relate cognitive test scores to the results of structural and functional imaging, and has
great potential for integrative data mining. Turkheimer et al.187 also quantitatively
examined the relationship between neuropsychological test scores and lesion locations
on structural neuroimaging.

4 Important issues in mining of brain images


4.1 The multiple comparisons problem
Using an exploratory analysis and computing a statistic for many tests (as in the case
of pairwise test) leads to the multiple comparisons problem, i.e. the situation where a
certain undesirably high number of the tests are expected to be positive by chance. A
standard Bonferroni correction98,153 suggests that one divide the significance threshold
by the number of independent tests performed. This typically overestimates the
number of independent tests performed, since test results are often correlated for
neighbouring structures (activations or lesions often extend over neighbouring
structures), and leads to loss of sensitivity. A heuristic modification of the Bonferroni
correction, the sequential Bonferroni correction,98 can be used to get less pessimistic
results. To do this, one sequentially increases the value of the significance threshold as
hypotheses are evaluated. In task-activation studies, increasing the threshold for
statistical significance increases the number of false positive activations that are
detected.
The Bonferroni correction only applies in the case where a null hypothesis is to be
rejected (i.e. any lesions or activations at all are treated as unexpected). The correction
is not necessary in the case where a hypothesized region of lesion/activation or a
structure is chosen,154 since in this case the null hypothesis of no changes is
necessarily relaxed. If the cross-correlation between adjoining voxels is considered, a
higher threshold can be used more safely. Single voxels that have signal changes of low
significance may be the result of noise, but several voxels together that have a
correlated change have a higher likelihood of representing a true lesion or
activation.155 One heuristic alternative to the Bonferroni correction is cluster filtering;
in this method, clusters smaller than a certain size (number of voxels) are simply
378 V Megalooikonomou et al.

discounted. Taking advantage of this kind of approach increases sensitivity, but it also
adds risk of error at the cluster level (rather than only at the voxel level).84 The overall
error rate can still be controlled, so the net effect is to reduce errors.

4.2 Clustering voxels


Clustering is the process of finding, in a contiguous spatial region, voxels with
similar significance in a voxel-based activation or lesion-deficit study. Clustering can
be done after independent significance tests are performed by grouping adjacent
significant voxels, or clusters can be calculated directly from the data by detecting
correlated changes.85,156–158 In the latter case, clustering can be viewed as a method for
generating hypotheses that statistical testing can evaluate.159
Clustering has been used to differentiate functional activations from other activity
in the brain using statistical methods160–165 and neural networks.166,167 Clustering has
been claimed as a higher quality analysis tool than correlation analysis because of its
ability to detect unanticipated difference in response, such as differing levels of
activation168 or similarities in the time-course of fMRI signal changes and stimuli.163
The cluster filtering approach mentioned earlier depends on having an estimate of the
likelihood of each cluster size, which depends on the noise distribution. Images of the
noise distribution can be obtained by, for example, subtracting the results of one
condition from a repetition of that condition. Using simulated images derived with the
same spatial correlation as these images, one can estimate the probability of observing
clusters above a given size and thus the probability of each cluster in the original data.164
Then, clusters below a desired probability threshold can be discarded as too uncertain.

4.3 Verification of mining and power considerations


In previous sections we discussed methods for finding associations between tasks
and activations, or between lesions and deficits. However, the evaluation of the
discovered knowledge for the structure–function analysis methods is not usually
addressed. Several researchers have studied the correspondence of sample size to
power for statistical tests such as the chi-square and Fisher exact tests of
independence,169 and compared the relative power of different statistical tests of
independence.170–173 In addition, simulations have studied the power of chi-square
analysis in sample spaces of much higher dimensionality, as one would expect to find
in many epidemiological studies.173–176 However, no closed-form power analyses exist
that can account for the simultaneous effects of image noise and registration error, in
addition to the characteristics of the statistical methods being employed.
One can use a simulator177 to not only test the scalability of mining methods, but
also evaluate different methods as a function of the number of samples needed, the
strength and complexity of associations, the spatial distribution of ROIs, and the
registration method used. A simulator can generate a large number of artificial
subjects and construct a probabilistic model of lesion-deficit or task-activation
associations. One can then model the error of a given registration method, apply it to
the image data, perform mining, and compare the generated associations with those
detected by the mining methods. The number of subjects required to recover the
known associations reflects the statistical power of the particular combination of
image-processing and statistical methods being evaluated.
Data mining in brain imaging 379
4.3.1 Using a simulator
As a case study, we show results from the evaluation of the Fisher exact test for the
detection of lesion-deficit associations.177 The results quantify the sensitivity and
accuracy of the mining method as a function of the number of subjects in the sample,
the strength and complexity of the associations, and the errors that arise due to
imperfect registration.
Comparing the results of simulated analysis to known associations allows one to
quantify the performance of a mining method. For this study, the simulation para-
meters for the distributions were obtained from data collected as part of the Frontal
Lobe Injury in Childhood (FLIC)100 lesion-deficit study. Simulated lesions were
generated using distributions for the number, size, and location of lesions. Because
misregistration introduces noise in the form of false-negative and false-positive
associations, this source of error was modelled by assuming that it follows a 3-D
nonstationary Gaussian distribution. Registration error was estimated by measuring
the error on distinct anatomical landmarks on a number of subjects and then
interpolating the error in the rest of the brain. The lesion-deficit-association model,
with its conditional-probability tables and prior probabilities, describes the relation-
ships between structures and functions. In the case where structure and function
variables are categorical (normal vs abnormal), these associations can be modelled
using Bayesian networks (BNs)145 as covered in Section 3.3.2. To examine the effect of
the strength of the lesion-deficit associations on the ability of the mining methods to
detect them, Table 3 presents three cases corresponding to strong, moderate, and weak
associations. Thus, a strong association between a structure si and a function fj is
denoted by conditional probabilities p(fj = A| si = N) = 0, p(fj = A|si = A) = 1,
p(fj = N| si = N) = 1 and p(fj = N|si = A) = 0, where A means abnormal and N
normal. Moderate and weak associations were defined similarly. Nondeterministic
disjunctive interactions between more than one structure and a function were
modelled using a noisy-OR model.142
The prior probability of structure abnormality for each structure si , in each subject
pj , was calculated from fsi ;pj : the fraction of the volume of si that overlapped with lesions
for pj . The conditional probability p( si |fsi ;pj ) is expected to be a sigmoid function,
although a step function with an appropriate threshold is used for simplicity. Each
structure with at least 1% of its volume overlapping with lesions was labelled as
abnormal for that subject. For each pair of simulated subject and structure, the prior-
probability distribution was sampled and a binary vector for the structures was
generated. By instantiating the states of all structure variables of the BN, the

Table 3 Three cases of BNs considered in a simulator177

Case Association Conditional probabilities for functions

1 Strong 0/1
2 Moderate 0.25/0.75
3 Weak 0.49/0.51
380 V Megalooikonomou et al.

conditional probability for each function variable was determined by table lookup.
This probability was then used to generate the binary vector for the function variables,
and Fisher’s exact test of independence was applied to each structure-function pair.

4.3.2 Results from the evaluation of a mining system


In this section, we describe how a lesion deficit simulator can be used to determine
the number of subjects needed to discover the simulated lesion-deficit associations
represented by a Bayesian network, the strengths of associations, the number of
associations, the degree of the network (i.e. the number of structures related to a
particular function), and the prior probabilities for structural abnormalities. A
Bayesian network with sufficient complexity was used177 to demonstrate the use of a
simulator in reaching meaningful results regarding the performance of the Fisher
exact test and the effects of misregistration. Since the performance of any method for
detecting associations depends on the characteristics of the conditional-probability
tables, three cases (see Table 3) were examined to study this effect. The prior
probability of abnormality for each structure was set to 0.5 to allow testing the
behaviour of the Fisher exact test for the optimal value of the prior probability. To
generate the conditional-probability table for those function variables that were
related to more than one structure, a noisy-OR model was used. The threshold 0.001
was used for the p-value, since this gives a good trade-off between the number of
simulated associations and the number of false positives detected. Figure 11
demonstrates the dramatic effects of the different conditional-probability distributions
on the power of lesion-deficit analysis. As expected, more samples are required to
detect weaker associations.
The degree of the associations of the Bayesian network was found to have a much
greater effect on the performance of the Fisher exact test than the total number of
associations. This result implies that, for functions that are associated with many
structures, identification of structure–function associations is difficult and requires a
larger sample size. Figure 12(a) shows the performance of the Fisher exact test for
three networks of 20, 40, and 80 edges and of the same degree (4) for the moderate case
(i.e. case 2) of the conditional-probability tables. Figure 12(b) shows the effect of
increasing the degree of the network (the number of structures affecting a particular
function) while fixing the total number of edges using the moderate case of the
conditional-probability tables.
Figure 11(b) demonstrates the performance of the Fisher exact test for the three
cases of Bayesian network conditional probabilities (see Table 3) when the prior
probability of a given structure being abnormal is obtained from the simulated data
set. The number of edges that could actually be discovered is 55 (80%), since there
were 14 edges from structures that did not intersect any lesions. Comparing this figure
with Figure 11(a), in which uniform prior probabilities were used, more subjects are
required to recover all associations when data-derived prior probabilities are used
instead of uniform prior probabilities, as expected. Also as expected, the number of
subjects needed is inversely proportional to the smallest prior probability. The
detection of false-positive associations is due to the existence of associations among
neighbouring structures due to lesions that intersect more than one structure.
Additional false positives can be observed in cases where associations occur between
Data mining in brain imaging 381

(a)

(b)

Figure 11 Evaluating a mining method. Performance of the Fisher exact test (p 0.001) for (a) uniform (0.5)
prior probabilities and (b) data-derived prior probabilities of structure abnormality, for the three strengths of
lesion-deficit associations from Table 3 that correspond to strong (case 1), moderate (case 2) and weak (case 3)
associations. The difference between the total number of associations detected and the number of true
associations detected is the number of false-positive associations detected for each case. The horizontal line in
(a) represents the total number of simulated edges (69) and in (b) represents the total number of simulated
edges that can be detected (55). Graphs from Megalooikonomou et al.177
382 V Megalooikonomou et al.

(a)

(b)

Figure 12 (a) Evaluating a mining method. Performance of the Fisher exact test (p 0.001) for BNs with degree
4 with 20, 40 and 80 edges. (b) Performance of the Fisher exact test (p 0.001) for BNs with 48 edges, and
with degree 4, 6, and 8. Graphs from Megalooikonomou et al.177
Data mining in brain imaging 383

behavioural variables. On average the specific registration method used reduces the
number of associations discovered by 13% for the same number of subjects when
compared with perfect registration.

5 Concluding remarks

In this review we have presented data mining methods that have been or could be used
for knowledge discovery from brain images of different modalities along with other
clinical data. We have focused on the problems of: (1) finding associations between
structures and functions through task-activation and lesion-deficit studies, (2)
studying the morphological variability of brain structures and finding associations
with certain conditions, (3) classifying shapes of brain structures, including tree-like
structures such as nerve fibres and abnormalities such as tumours, and searching for
similarity, and (4) finding associations between gene expressions, morphology, and
function. We have presented results of applying mining methods to epidemiological
data that demonstrate detection of several clinically meaningful associations in
different studies. These methods can lead to interesting conclusions about the funct-
ional mapping of the human brain, the effect of lesions or other abnormalities in the
development of neurological and neuropsychological deficits, and the effect of certain
diseases and gene expressions on structural morphology and function.
Visualization can help reduce the inherently enormous search space in statistical
analysis. Exploratory analysis through the use of a statistic for many tests produces
reasonable results, although one has to deal with the multiple-comparison problem.
Voxel-based approaches show encouraging results, but are computationally intensive
and even more severely impacted by the multiple comparison (although the latter can
be addressed with clustering analysis). Statistical simulations show that more
advanced mining methods and large sample sizes are required to determine lesion-
deficit associations accurately, with reduced number of false positive associations.
Simulators can be used for verifying and comparing mining methods in brain
imaging. Their use is very important especially in determining the number of subjects
needed to detect all associations while reducing false positives. In particular, in lesion-
deficit analysis, simulators have shown that the number of subjects required to detect
all and only those associations in the underlying model (i.e. the ground truth) may be
in the thousands, even for strong associations, particularly if the spatial distribution of
lesions does not extend to all structures. The more one descends from the 0.5 level for
prior probabilities, the more difficult it becomes to discover associations. These results
underline the necessity of developing large image databases for the purpose of meta-
analysis of data pooled from multiple studies, so that more meaningful results can be
obtained. The testing procedure framework is very important, since it can be used to
characterize the power of methods for detecting multivariate associations while taking
into account the effects of registration and noise. Simulators can also be used in the
evaluation of new analysis methods, as well as in the study of the effect of different
registration and segmentation algorithms.
Existing mining algorithms are limited in that they typically assume data will
consist of individual numeric and symbolic features. We still lack effective algorithms
384 V Megalooikonomou et al.

for learning from data that is represented as a combination of various types (i.e.
multimedia data). Predictions based on the full medical record could potentially
achieve much greater accuracy than those that are limited to one data type. In
addition, prediction accuracy can be improved by inventing more appropriate features
to describe the brain data. We need new methods that actively generate optimal
experiments to collect the most informative data. Another obstacle is integrating data
from different investigators and analysing them jointly. Brain imaging data are
usually collected in a single database for a specific study and with a specific data
mining task in mind, so an additional important issue is interoperability and the
ability to learn from multiple databases.178 Also, the mining algorithms developed so
far tend to be fully automated and therefore do not allow active experimentation, i.e.
guidance from experts at key stages in the search for brain data regularities. Ideally,
human experts should be able to collaborate closely with a mining algorithm to form
hypotheses and test them against the data. In addition, mining methods need to be
able to scale to extremely large data sets. Research during the past few years has
already produced more efficient algorithms for such problems as learning association
rules2 and efficient visualization of large data sets.179 A closer integration of machine
learning algorithms into database management systems is also needed.

Acknowledgements
The authors wish to thank Christos Davatzikos, Eddie Herskovits, Christos
Faloutsos, Paul Thompson, David Isecke, Ling Cheng, and Tilmann Steinberg for
providing pictures, comments, and other helpful information. This work was
supported in part by the Ira DeCamp Foundation, NARSAD and New Hampshire
Hospital. Support was also provided by the Dartmouth Experimental Visualization
Laboratory (DEVLAB).

References
1 Koslow SH, Huerta MF eds. Neuroinformatics: distribution analysis of subtracted PET
an overview of the Human Brain Project. images. Journal of Cerebral Blood Flow
Mahway, NJ: Lawrence Erlbaum, 1997. Metabolism 1988; 8: 642–53.
2 Agrawal R, Imielinski T, Swami A. Database 7 Fox P. Functional brain mapping with
mining: a performance perspective. IEEE positron emission tomography. Seminars in
Transactions on Knowledge and Data Engineering Neurology 1989; 9: 323–9.
1993; 5: 914–25. 8 Fox P, Mintum M. Noninvasive functional
3 Huerta MF, Koslow SH, Leshner AI. The brain mapping by change-distribution
Human Brain Project: an international analysis of averaged PET images of H2150
resource. Trends in Neuroscience 1993; 16: 436–
tissue activity. Journal of Nuclear Medicine
38.
1989; 30: 141–49.
4 Mitchell TM. Machine learning and data
9 Fox P. Physiological ROI definition by image
mining. Communications of the ACM 1999; 42:
31–36. subtraction. Journal of Cerebral Blood Flow
5 Anderson S, Damasio H. Neuropsychological Metabolism 1991; 11: A79–82.
impairments associated with lesions caused by 10 Evans A, Beil C, Marrett S, Thompson C,
tumor or stroke. Archives in Neurology 1990; 47: Hakim A. Anatomical–function correlation
397–405. using an adjustable MRI-based region of
6 Fox P, Mintum M, Reiman E, Raichle M. interest atlas with positron emission
Enhanced detection of brain responses using tomography. Journal of Cerebral Blood Flow
intersubject averaging and change- Metabolism 1988; 8: 513–30.
Data mining in brain imaging 385
11 Evans A, Marrett S, Torrescorzo J, Ku S, Transactions on Medical Imaging 1997; 16: 878–
Collins L. MRI–PET correlation in three 86.
dimensions using a volume-of-interest (VOI) 24 Chang M, Sezan M, Tekalp A, Berg M.
atlas. Journal of Cerebral Blood Flow Metabolism Bayesian segmentation of multislice brain
1991; 11: A69–78. magnetic resonance imaging using three-
12 Arya M, Cody W, Faloutsos C, Richardson J, dimensional Gibbsian priors. Optical
Toga A. A 3D medical image database Engineering 1996; 35: 3206–21.
management system. International Journal of 25 Collins D, Evans A. ANIMAL: validation and
Computerized Medical Imaging and Graphics applications of nonlinear registration-based
1996; 20: 269–84. segmentation. International Journal of Pattern
13 Letovsky SI, Whitehead SH, Paik CH et al. A Recognition and Artificial Intelligence 1997; 11:
brain image database for structure–function 1271–94.
analysis. American Journal of Neuroradiology 26 Gee J, Reivich M, Bajcsy R. Elastically
1998; 19: 1869–77. deforming 3D atlas to match anatomical brain
14 Herskovits EH, Megalooikonomou V, images. Journal of Computed Assisted
Davatzikos C, Chen A, Bryan RN, Gerring JP. Tomography 1993; 17: 225–36.
Is the spatial distribution of brain lesions 27 Miller M, Christensen G, Amit Y, Grenander
associated with closed-head injury predictive U. Mathematical textbook of deformable
of subsequent development of attention-deficit neuroanatomies. Proceedings of the National
hyperactivity disorder?: Analysis with brain- Academy of Sciences 1993; 90:11944–48.
image database. Radiology 1999; 213: 389–94. 28 Kamber M, Shingal R, Collins D, Francis G,
15 Nielsen FA, Hansen LK. Modeling of Evans A. Model-based 3-D segmentation of
brainmap data. In: NIPS ‘99. Denver, multiple sclerosis lesions in magnetic
Colorado; 1999. resonance brain images. IEEE Transactions on
16 Nowinski WL, Fang A, Nguyen BT et al. Medical Imaging 1995; 14: 442–53.
Multiple brain atlas database and atlas-based 29 Talairach J, Tournoux P. Co-planar stereotaxic
neuroimaging system. Computer Aided Surgery atlas of the human brain. Stuttgart: Thieme,
1997; 2: 42–66. 1988.
17 Levrier O, Poline J, Tzourio N, Mazoyer B, 30 Bookstein F. Principal warps: thin-plate
Salamon G. Individual functional splines and the decomposition of
neuroanatomy using PET–MRI integration. deformations. IEEE Transactions on Pattern
In: 31st Annual Meeting of the American Society Analysis and Machine Intelligence 1989; 11: 567–
of Neuroradiology. Vancouver, BC, 1993. 85.
18 Rajapakse J, Giedd J, Rapoport J. Statistical 31 Collins D, Neelin P, Peters T, Evans A.
approach to segmentation of single-channel Automatic 3D intersubject registration of MR
cerebral MR images. IEEE Transactions on volumetric data in standardized Talairach
Medical Imaging 1997; 16: 176–86. space. Journal of Computer Assisted Tomography
19 Pal N, Pal S. A review on image segmentation 1994; 18: 192–205.
techniques. Pattern Recognition 1993; 26: 1277– 32 Davatzikos C. Spatial transformation and
94. registration of brain images using elastically
20 Zhang Y. A survey on evaluation methods for deformable models. Computer Vision and Image
image segmentation. Pattern Recognition 1996; Understanding 1997; 66: 207–22.
29: 1335–46. 33 Damasio H. Human brain anatomy in
21 Worth A, Makris N, Caviness V, Kennedy D. computerized images. Oxford: Oxford University
Neuroanatomical segmentation in MRI: Press, 1995.
Technological objectives. International Journal 34 Talairach J, Szikla G. Atlas d’anatomie
of Pattern Recognition and Artificial Intelligence stereotaxique du telencephale: etudes anatomo-
1997; 11: 116–87. radiologiques. Paris: Masson, 1967.
22 Vannier M, Butterfield R, Rickman D, Jordan 35 Ono M, Kubik S, Abernathey C. Atlas of the
D, Murphy W, Biondetti P. Multispectral cerebral sulci. Stuttgart: Thieme, 1990.
magnetic resonance image analysis. CRC 36 Brodmann K. Vergleichende
Critical Reviews in Biomedical Engineering 1987; Lokalisationslehre der Grosshirnrinde in
15: 117–44. ihren Principien dargestellt auf Grund des
23 Held K, Korps E, Krause B, Wells W, Kikinis Zellenbaues, Barth, Leipzig. In: Some Papers
R, Muller-Gartner H. Markov random field on the Cerebral Cortex. Springfield, IL:
segmentation of brain MR images. IEEE Thomas, 1960: 201–30.
386 V Megalooikonomou et al.
37 Minoshima S, Koeppe R, Frey K, Ishihara M, 50 Lindeberg T, Lidberg Par, Roland PE.
Kuhl D. Stereotactic PET atlas of the human Analysis of brain activation patterns using a 3-
brain: aid for visual interpretation of D scale-space primal sketch. Human Brain
functional brain images. Journal of Nuclear Mapping 1999; 7:166–94.
Medicine 1994; 35: 949–54. 51 Lowe MJ, Sorenson JA. Spatially filtering
38 Bihan DL. Functional MRI of the brain: functional magnetic resonance imaging data.
principles, applications and limitations. Magnetic Resonance in Medicine 1997; 37: 723–
Neuroradiology 1996; 23: 1–5. 29.
39 Essen DV, Maunsell J. Hierarchical 52 Raz J, Turetsky B. Wavelet ANOVA and
organization of functional streams in the fMRI. In: Wavelet Applications in Signal and
visual cortex. Trends in Neurological Sciences Image Processing VIII. Denver, Colorado, 1999.
1983; 6: 370–75. 53 Sijbers J, Dekker AJd, Van der Linden A,
40 Wible CG, Shenton ME, Fischer IA et al. Verhoye TM, Van Dyck D. Adaptive
Parcellation of the human prefrontal cortex anisotropic noise filtering for magnitude MR
using MRI. Psychiatry Research 1997; 76: 29– data. Magnetic Resonance Imaging 1999; 17:
40. 1533–39.
41 Ogawa S, Menon RS, Kim SG, Ugurbil K. On 54 Skudlarski P, Constable RT, Gore JC. ROC
the characteristics of functional magnetic analysis of statistical methods used in
resonance imaging of the brain. Annual Review functional MRI: individual subjects.
of Biophysical and Biomolecular Structures 1998; NeuroImage 1999; 9: 311–32.
27:447–74. 55 Coulon O, Mangin J-F, Poline J-B, Frouin V,
42 Bahn MM. A linear relationship exists among Bloch I. Structural group analysis of
brain diffusion eigenvalues measured by functional maps. In: Kuba MSaAT-P A ed.
diffusion tensor magnetic resonance imaging. 16th International Conference on Information
Journal of Magnetic Resonance 1999; 137: 33–38. Processing in Medical Imaging. Visegrad, 1999:
43 Friston KJ, Williams S, Howard R, 448–53.
Frackowiak RSJ, Turner R. Movement related 56 Worsley KJ, Marrett S, Neelin P, Evans AC.
effects in fMRI time series. Magnetic Resonance Searching scale space for activation in PET
in Medicine 1996; 35: 346–55. images. Human Brain Mapping 1996; 4: 74–90.
44 Thacker NA, Burton E, Lacey AJ, Jackson A. 57 Poline J-B, Mazoyer BM. Enhanced detection
The effects of motion on parametric fMRI in brain activation maps using a multifiltering
analysis techniques. Physiological Measurement approach. Journal of Cerebral Blood Flow
1999; 20: 251–63. Metabolism 1994; 14: 639–42.
45 Ashburner J, Friston KJ. The role of 58 Brammer MJ. Multidimensional wavelet
registration and spatial normalization in analysis of functional magnetic resonance
detecting activations in functional imaging. images. Human Brain Mapping 1998; 6: 378–82.
Clinical MRI/Developments in MR 1997; 7: 26– 59 Jansen M, Uytterhoeven G, Bultheel A. Image
8. de-noising by integer wavelet transforms and
46 Kim B, Boes JL, Bland PH, Chenevert TL, generalized cross validation. Medical Physics
Meyer CR. Motion correction in fMRI via 1999; 26: 622–30.
registration of individual slices into an 60 Ruttimann UE, Unser M, Rawlings RR et al.
anatomical volume. Magnetic Resonance in Statistical analysis of functional MRI data in
Medicine 1999; 41: 964–72. the wavelet domain. IEEE Transactions on
47 Worsley KJ, Marrett S, Neelin P, Vandal AC, Medical Imaging 1998; 17: 142–54.
Friston KJ, Evans AC. A unified statistical 61 Zaroubi S, Goelman G. Complex denoising of
approach to determining significant signals in MR data via wavelet analysis: application for
images of cerebral activation. Human Brain functional MRI. Magnetic Resonance Imaging
Mapping 1996; 4: 58–73. 2000; 18: 59–68.
48 Descombes X, Kruggel F, Cramon DYv. fMRI 62 Kruggel F, Cramon DYv. Physiologically
signal restoration using a spatio-temporal oriented models of the hemodynamic response
Markov random field preserving transitions. in functional MRI. Lecture Notes in Computer
NeuroImage 1998; 8: 340–49. Science. Berlin: Springer, 1999.
49 Descombes X, Kruggel F, Cramon DYv. 63 Kruggel F, Cramon DYv. Modeling the
Spatio-temporal fMRI analysis using Markov hemodynamic response in single-trial
random fields. IEEE Transactions on Medical functional MRI experiments. Magnetic
Imaging 1998; 17: 1028–39. Resonance in Medicine 1999; 42: 787–97.
Data mining in brain imaging 387
64 Friston KJ, Holmes AP, Worsley KJ, Poline 77 Owen CB. Multiple media correlation: theory and
JP, Frith CD, Frackowiak RSJ. Statistical applications. Hanover: Dartmouth College,
parametric maps in functional imaging: a 1998.
general linear approach. Human Brain 78 Owen CB, Makedon F. Computed
Mapping 1995; 2: 189–210. synchronization for multimedia applications.
65 Friston K. Statistical parametric mapping and Dordrecht: Kluwer, 1999.
other analyses of functional imaging data. In 79 Worsley KJ, Poline JB, Friston KJ, Evans AC.
Toga A, Mazziotta J eds. Brain mapping: the Characterizing the response of PET and fMRI
methods. San Diego, CA: Academic Press, data using multivariate linear models.
1996. NeuroImage 1997; 6: 305–19.
66 Friston KJ, Holmes AP, Price CJ, Buchel C, 80 Worsley KJ, Poline J-B, Vandal AC, Friston
Worsley KJ. Multisubject fMRI studies and KJ. Tests for distributed, non-focal brain
conjunction analyses. NeuroImage 1999; 10: activations. NeuroImage 1995; 2: 183–94.
385–96. 81 Frank LR, Buxton RB, Wong EC.
67 Bosch V. Statistical analysis of multi-subject Probabilistic analysis of functional magnetic
fMRI data: assessment of focal activations. resonance imaging data. Magnetic Resonance in
Journal of Magnetic Resonance Imaging 2000; 11: Medicine 1998; 39: 132–48.
61–64. 82 Fu Z, Hui Y, Liang Z-P. Joint spatiotemporal
68 Cohen MS, DuBois RM. Stability, statistical analysis of functional MRI data. In:
repeatability, and the expression of signal Proceedings of IPCIP’98 International
magnitude in functional magnetic resonance Conference on Image Processing, 1998: 709–13.
imaging. Journal of Magnetic Resonance Imaging 83 Svensen M, Kruggel F, Von Cramon DY.
1999; 10: 33–40. Markov random field modelling of fMRI data
69 Tegeler C, Strother SC, Anderson JR, Kim SG. using a mean field EM-algorithm. In Pelillo
Reproducibility of BOLD-based functional ERHaM ed. Second International Workshop on
MRI obtained at 4 T. Human Brain Mapping Energy Minimization Methods in Computer
1999; 7: 267–83. Vision and Pattern Recognition. York. London:
70 Casey BJ, Cohen JD, O’Craven K et al. Springer, 1999: 317–30.
Reproducibility of fMRI results across four 84 Poline J-B, Worsley KJ, Evans AC, Friston KJ.
institutions using a spatial working memory Combining spatial extent and peak intensity
task. NeuroImage 1998; 8: 249–61. to test for activations in functional imaging.
71 Constable RT, Skudlarski P, Mencl E et al. NeuroImage 1997; 5: 83–96.
Quantifying and comparing region-of-interest 85 Worsley KJ, Cao J, Paus T, Petrides M, Evans
activation patterns in functional brain MR AC. Applications of random field theory to
imaging: methodology considerations. functional connectivity. Human Brain Mapping
Magnetic Resonance Imaging 1998; 16: 289–300. 1998; 6: 364–67.
72 Lange N, Strother SC, Anderson JR et al. 86 Horwitz B. Modeling of functional brain
Plurality and resemblance in fMRI data imaging data. In: SPIE Ninth Workshop on
analysis. NeuroImage 1999; 10: 282–303. Virtual Intelligence/Dynamic Neural Networks.
73 Purdon PL, Weisskoff RM. Effect of temporal Stockholm, Sweden, 1999.
autocorrelation due to physiological noise and 87 Holmes AP, Blair RC, Watson JD, Ford I.
stimulus paradigm on voxel-level false- Nonparametric analysis of statistic images
positive rates in fMRI. Human Brain Mapping from functional mapping experiments. Journal
1998; 6: 239–49. of Cerebral Blood Flow Metabolism 1996; 16: 7–
74 Lowe MJ, Russell DP. Treatment of baseline 22.
drifts in fMRI time series analysis. Journal of 88 Bandettini PA, Jesmanowicz A, Wong EC,
Computer Assisted Tomography 1999; 23: 463–73. Hyde JS. Processing strategies for time-course
75 Maas LC, Frederick Bd, Yurgelun-Todd DA, data sets in functional MRI of the human
Renshaw PF. Autocovariance based analysis of brain. Magnetic Resonance in Medicine 1993; 30:
functional MRI data. Biological Psychiatry 161–73.
1996; 39: 640–41. 89 Bullmore E, Brammer M, Williams S et al.
76 McIntosh AR, Brookstein FL, Haxby JV, Statistical methods of estimation and
Grady CL. Spatial pattern analysis of inference for functional MR image analysis.
functional brain images using partial least Magnetic Resonance in Medicine 1996; 35: 261–
squares. NeuroImage 1996; 3: 143–57. 77.
388 V Megalooikonomou et al.
90 Strother S, Kanno I, Rottenberg D. Principal and spatial variability of the optic radiation.
component analysis, variance partitioning, NeuroImage 1999; 10: 489–99.
and functional connectivity. Journal of 104 May A, Ashburner J, Buchel C et al.
Cerebral Flow and Metabolism 1995; 15: 355–60. Correlation between structural and functional
91 McKeown MJ, Makeig S, Brown CG et al. changes in brain in an idiopathic headache
Analysis of fMRI data by blind separation into syndrome [see comments]. Nature Medicine
independent spatial components. Human 1999; 5: 836–38.
Brain Mapping 1998; 6: 160–88. 105 Mummery CJ, Patterson K, Wise RJ,
92 Buchel C, Coull J, Friston K. The predictive Vandenbergh R, Price CJ, Hodges JR.
value of changes in effective connectivity for Disrupted temporal lobe connections in
human learning. Science 1999; 283: 1538–41. semantic dementia. Brain 1999; 122: 61–73.
93 Cressie N. Statistics for spatial data. New York: 106 Davatzikos C, Vaillant M, Resnick SM, Prince
John Wiley, 1993. JL, Letovsky S, Bryan RN. A computerized
94 Tsukimoto H, Morita C. The discovery of approach for morphological analysis of the
rules from brain images. In: Discovery Science – corpus callosum. Journal of Computer
First International Conference, 1998: 198–209. Assisted Tomography 1996; 20: 88–97.
95 Quinlan J. Induction of decision trees. 107 Csernansky J, Joshi S, Wang L et al.
Machine Learning 1986; 1: 81–106. Hippocampal morphometry in schizophrenia
96 Enbank R. Spline smoothing and nonparametric by high dimensional brain mapping.
regression. New York: Marcel Dekker, 1988. Proceedings of the National Academy of Sciences
97 Megalooikonomou V, Davatzikos C, of the United States of America 1998; 95: 11406–
Herskovits E. Mining lesion-deficit 11.
associations in a brain image database. In: 108 Thompson PM, Schwartz C, Toga AW. High-
ACM SIGKDD International Conference on resolution random mesh algorithms for
Knowledge Discovery and Data Mining. San creating a probabilistic 3D surface atlas of the
Diego, CA, 1999: 347–51. human brain. NeuroImage 1996; 3: 19–34.
98 Andersen E. Introduction to the statistical 109 Thompson PM, Schwartz C, Lin RT, Khan
analysis of categorical data. Berlin: Springer, AA, Toga AW. Three-dimensional statistical
1997. analysis of sulcal variability in the human
99 Bryan R, Manolio T. A method for using MR brain. Journal of Neuroscience 1996; 16: 4261–
to evaluate the effects of cardiovascular 74.
disease of the brain: the cardiovascular health 110 Alen L, Richey M, Cahi Y, Gorski R. Sex
study. American Journal of Neuroradiology 1994; differences in the corpus callosum of the
15: 1625–33. living human being. Journal of Neuroscience
100 Gerring J, Brady K, Chen A et al. 1991; 11: 933–42.
Neuroimaging variables related to the 111 Delisi L. Brain imaging studies of cerebral
development of secondary attention deficit morphology and activation in schizophrenia.
hyperactivity disorder in children who have In: Steinhauer SR, Gruzelier H eds.
moderate and severe closed head injury. Neuropsychology, psychophysiology, and
Journal of the American Academy of Child and information processing. Amsterdam: Elsevier,
Adolescent Psychiatry 1998; 37: 647–54. 1991: 147–60.
101 Bryan R, Wells S, Miller T et al. Infarctlike 112 Nelson MD, Saykin AJ, Flashman LA,
lesions in the brain: prevalence and anatomic Riordan HJ. Hippocampal volume reduction
characteristics at MR imaging of the elderly – in schizophrenia as assessed by magnetic
data from cardiovascular health study. resonance imaging: a meta-analytic study.
Radiology 1997; 202: 47–54. Archives in General Psychiatry 1998; 55: 433–40.
102 Mummery CJ, Patterson K, Price CJ, 113 Thompson PM, Moussai J, Zohoori S et al.
Ashburner J, Frackowiak RS, Hodges JR. A Cortical variability and asymmetry in normal
voxel-based morphometry study of semantic aging and Alzheimer’s disease. Cerebral Cortex
dementia: relationship between temporal lobe 1998; 8: 492–509.
atrophy and semantic memory. Annals in 114 Mazziotta JC, Toga AW, Evans A, Fox P,
Neurology 2000; 47: 36–45. Lancaster J. A probabilistic atlas of the human
103 Burgel U, Schormann T, Schleicher A, Zilles brain: theory and rationale for its
K. Mapping of histologically identified long development. The International Consortium
fiber tracts in human cerebral hemispheres to for Brain Mapping (ICBM). NeuroImage 1995;
the MRI volume of a reference brain: position 2: 89–101.
Data mining in brain imaging 389
115 Bookstein F. Biometrics, biomathematics, and Conference on Visualization in Biomedical
the morphometric synthesis. Bulletin of Computing, 1996: 429–38.
Mathematical Biology 1996; 58: 313–65. 130 Korn F, Sidiropoulos N, Faloutsos C, Siegel E,
116 Davatzikos C, Resnick S. Sex differences in Protopapas Z. Fast and effective similarity
anatomic measures of interhemisperic search in medical tumor databases using
connectivity: correlations with cognition in morphology. In: SPIE Proceedings. Boston,
men but not in women. Cerebral Cortex 1998; 8: MA, 1996.
635–40. 131 Guttman A. R-trees: a dynamic index
117 Miller M, Banerjee A, Christensen G et al. structure for spatial searching. ACM
Statistical methods in computational SIGMOD 1984; 47–57.
anatomy. Statistical Methods in Medical 132 Agrawal R, Faloutsos C, Swami A. Efficient
Research 1997; 6: 267–99. similarity search in sequence databases. In:
118 Gee J, LeBriquer L, Barillot C. Probabilistic Foundations of Data Organization and
matching of brain images. In: Bizais Y, Algorithms (FODO) Conference. Evanston, IL,
Barillot C, Di Paola R eds. Information 1993.
processing in medical imaging. Dordrecht: 133 Eden M. A two-dimensional growth process.
Kluwer, 1995: 113–25. In: Fourth Berkeley Symposium on Mathematical
119 Haller J, Christensen G, Joshi S et al. Statistics and Probability. Berkeley, CA:
Hippocampal MR imaging morphometry by University of California Press, 1961.
means of general pattern matching. Radiology 134 Ringwald M, Baldock R, Bard J et al. A
1996; 199: 787–91. database for mouse development. Science 1994;
120 Grenander U, Miller MI. Computational 265: 2033–34.
anatomy: an emerging discipline. Statistical 135 Williams B, Doyle M. An internet atlas of
mouse development. Computerized Medical
Computing and Graphics Newsletter 1996; 7: 3–8.
Imaging and Graphics 1996; 20: 433–47.
121 Grenander U, Miller MI. Computational
136 Toga A, Santori E, Hazani R, Ambach K. A 3D
anatomy: an emerging discipline. Quarterly of
digital map of the rat brain. Brain Research
Applied Mathematics 1998; LVI: 617–94.
Bulletin 1995; 38: 77–85.
122 Sholl D. Dendritic organization in the
137 Cohen F, Yang Z, Huang Z, Nissanov J.
neurons of the visual and motor cortices of the
Automatic matching of homologous
cat. Journal of Anatomy 1953; 87: 387–406. histological sections. IEEE Transactions on
123 Hu MK. Visual pattern recognition by
Biomedical Engineering 1998; 45: 642–50.
moment invariants. IRE Transactions on 138 Ali W, Cohen F. Registering coronal
Information Theory 1962; 8: 179–87. histological 2-D sections of a rat brain with
124 Sadjadi F, Hall E. Three-dimensional moment coronal sections of a 3-D atlas using geometric
invariants. IEEE Transactions on Pattern curve invariants and B-spline representation.
Analysis and Machine Intelligence 1980; 2: 127– IEEE Transactions on Medical Imaging 1998; 17:
36. 957–66.
125 Teh C-H, Chin R. On image analysis by the 139 Rangarajan A, Chui H, Mjolsness E et al. A
methods of moments. IEEE Transactions on robust point matching algorithm for
Pattern Analysis and Machine Intelligence 1988: autoradiograph alignment. Medical Image
10: 496–513. Analysis 1997; 4: 379–98.
126 Karten HJ, Kelly P. Content-based query and 140 Rangarajan A, Chui H, Bookstein F. The
retrieval in neuroscience databases, 1999. softassign Procrustes matching algorithm.
127 Jacobs GA, Theunissen FE. Extraction of Information Processing in Medical Imaging.
sensory parameters from a neural map by London: Springer, 1997: 29–42.
primary sensory interneurons. Journal of 141 Besl P, McKay N. A method for registration of
Neuroscience 2000; 20: 2934–43. 3-D shapes. IEEE Transactions on Pattern
128 Symanzik J, Ascoli GA, Washington SS, Analysis and Machine Intelligence 1992; 14: 239–
Krichmar JL. Visual data mining of brain 56.
cells. Computing Science and Statistics 1999; 31: 142 Pearl J. Fusion, propagation and structuring
445–49. in belief networks. Artificial Intelligence 1986;
129 Rossmanith C, Handels H, Poppl S, Rinast E, 29: 241–88.
Weiss D. Characterisation and classification 143 Herskovits E. Computer-based probabilistic
of brain tumors in three-dimensional MR network construction. PhD thesis. Stanford
image sequences. In: Fourth International University, CA, 1991.
390 V Megalooikonomou et al.
144 Cooper G, Herskovits E. A Bayesian method 157 Baumgartner R, Ryner L, Richter W,
for the induction of probabilistic networks Summers R, Jarmasz M, Somarjai R.
from data. Machine Learning 1992; 9: 309–47. Comparison of two exploratory data analysis
145 Pearl J. Probabilistic reasoning in intelligent methods for fMRI: fuzzy clustering vs.
systems: networks of plausible inference. San principal component analysis. Magnetic
Mateo, CA: Morgan Kaufmann, 1988. Resonance Imaging 2000; 18: 89–94.
146 Bouckaert R. Properties of measures for 158 Golay X, Kollias S, Stoll G, Meier D, Valavanis
Bayesian belief network learning. In: Tenth A, Boesiger P. A new correlation-based fuzzy
Conference on Uncertainty in Artificial logic clustering algorithm for fMRI. Magnetic
Intelligence. Seattle, WA, 1994. Resonance in Medicine 1998; 40: 249–60.
147 Lam W, Bacchus F. Learning Bayesian belief 159 Baumgartner R, Somorjai R, Summers R,
networks – an approach based on the MDL Richter W, Ryner L, Jarmasz M. Resampling
principle. Computational Intelligence 1994; 10: as a cluster validation technique in fMRI.
269–93. Journal of Magnetic Resonance Imaging 2000; 11:
148 Rissanen J. Modeling by shortest data 228–31.
description. Automatica 1978; 14: 465–71. 160 Baune A, Sommer FT, Erb M et al. Dynamical
149 Lam W. Bayesian network refinement via cluster analysis of cortical fMRI activation.
machine learning approach. IEEE NeuroImage 1999; 9: 477–89.
Transactions on Pattern Analysis and Machine 161 Fadili MJ, Ruan S, Bloyet D, Mazoyer B.
Intelligence 1998; 20: 240–51. Unsupervised fuzzy clustering analysis of
150 Gur RC, Trivedi SS, Saykin AJ, Gur RE. fMRI series. In: Proceedings of the 20th Annual
‘Behavioral imaging’ – a procedure for International Conference of the IEEE
analysis and display of neuropsychological Engineering in Medicine and Biology Society, Vol.
test scores: I. Construction of algorithm and 20, Biomedical Engineering Towards the Year
initial clinical application. Neuropsychiatry, 2000 and Beyond. Hong Kong, 1998: 696–99.
Neuropsychology, and Behavioral Neurology 162 Filzmoser P, Baumgartner R, Moser E. A
1988; 1: 53–60. hierarchical clustering method for analyzing
151 Gur RC, Saykin AJ, Blonder LX, Gur RE. functional MR images. Magnetic Resonance
‘Behavioral imaging’: II. Application of the Imaging 1999; 17: 817–26.
quantitative algorithm to hypothesis testing in 163 Goutte C, Toft P, Rostrup E, Nielsen F,
the population of hemiparkinson patients. Hansen L. On clustering fMRI time series.
Neuropsychiatry, Neuropsychology, and NeuroImage 1999; 9: 298–310.
Behavioral Neurology 1988; 1: 87–96. 164 Ledberg A, Akerman S, Roland PE.
152 Gur RC, Saykin AJ, Benton A et al. ‘Behavioral Estimation of the probabilities of 3D clusters
imaging’ – III. Interexpert agreement and in functional brain images. NeuroImage 1998;
reliability of weightings. Neuropsychiatry, 8: 113–28.
Neuropyschology, and Behavioral Neurology 165 Moser E, Baumgartner R, Barth M,
1990; 3: 113–24. Windischberger C. Explorative signal
153 Fisher LD, Belle GV. Biostatistics: a processing in functional MR imaging.
methodology for the health sciences. New York: International Journal of Imaging Systems and
John Wiley, 1993. Technology 1999; 10: 166–76.
154 Friston KJ. Testing for anatomically specified 166 Fischer H, Hennig J. Neural network-based
regional effects. Human Brain Mapping 1997; 5: analysis of MR time series. Magnetic Resonance
133–36. in Medicine 1999; 41: 124–31.
155 Forman SD, Cohen JD, Fitzgerald M, Eddy 167 Horwitz B, Tagamets MA. Predicting human
WF, Mintun MA, Noll DC. Improved functional maps with neural net modeling.
assessment of significant activation in Human Brain Mapping 1999; 8: 137–42.
functional magnetic resonance imaging 168 Baumgartner R, Windischberger C, Moser E.
(fMRI): use of a cluster-size threshold. Quantification in functional magnetic
Magnetic Resonance in Medicine 1995; 33: 636– resonance imaging: fuzzy clustering vs.
47. correlation analysis. Magnetic Resonance
156 Baumgartner R, Somorjai R, Summers R, Imaging 1998; 16: 115–25.
Richter W. Assessment of cluster homogeneity 169 Larntz K. Small-sample comparisons of exact
in fMRI data using Kendall’s coefficient of levels for chi-squared goodness-of-fit statistics.
concordance. Magnetic Resonance Imaging 1999; Journal of the American Statistical Association
17: 1525–32. 1978; 73: 253–63.
Data mining in brain imaging 391
170 Lee C, Shen S. Convergence-rates and power visualization of traditional and multimedia
of 6 power-divergence statistics for testing datasets. In: ACM SIGMOD Conference on
independence in 2by2 contingency table. Management of Data. San Jose, CA, 1995.
Communications in Statistics – Theory and 180 Cohen J. Statistical power analysis for the
Methods 1994; 23: 2113–26. behavioral sciences. Mahway, NJ: Lawrence
171 Oluyede B. A modified chi-square test of Erlbaum, 1987.
independence against a class of ordered 181 Megalooikonomou V, Herskovits E. Mining
alternatives in a RxC contingency table. structure-function associations in a brain
Canadian Journal of Statistics 1994; 22: 75–87. image database. In: Cios K ed. Medical data
172 Harwell M, Serlin R. An empirical study of mining and knowledge discovery. Berlin:
five multivariate tests for the singe-factor Springer, 2000.
repeated measures model. Communications in 182 Narr K, Sharma T, Moussai J et al. 3D maps of
Statistics – Simulation and Computation 1997; cortical surface variability and sulcal
26: 605–18. asymmetries in schizophrenia and normal
173 Tanizaki H. Power comparison of non- populations. In: 5th International Conference on
parametric tests: small-sample properties Functional Mapping of the Human Brain.
from Monte Carlo experiments. Journal of Dusseldorf, 1999.
Applied Statistics 1997; 24: 603–32. 183 Thompson P, Mega M, Toga A. Disease-
174 Osius G, Rojek D. Normal goodness-of-fit tests specific brain atlases. In: Toga A, Mazziotta J,
for multinomial models with large degrees of
Frackowiak R eds. Brain mapping: the disorders.
freedom. Journal of the American Statistical
San Diego, CA: Academic Press, 2000.
Association 1992; 87: 1145–52.
184 Thompson P, Woods R, Mega M, Toga A.
175 Thomas R, Conlon M. Sample-size
Mathematical/computational challenges in
determination based on Fisher exact test for
use in 22 comparative trials with low event creating deformable and probabilistic atlases
rates. Controlled Clinical Trials 1992; 13: 134– of the human brain. Human Brain Mapping
47. 2000; 9: 81–92.
176 Mannan M, Nassar R. Size and power of test 185 Thompson P, Toga A. Detection, visualization
statistics for gene correlation in 22 and animation of abnormal anatomic
contingency-tables. Biometrical Journal 1995; structure with a deformable probabilistic
37: 409–33. brain atlas based on random vector field
177 Megalooikonomou V, Davatzikos C, transformations. Medical Image Analysis 1997;
Herskovits E. A simulator for evaluating 1: 271–94.
methods for the detection of lesion-deficit 186 Ashburner J, Friston KJ. Voxel-based
associations. Human Brain Mapping in press, morphometry – the methods. NeuroImage
2000. 2000; 11: 805–21.
178 Shu Y, Liaw J-S, Berger T, Shahabi C. Data 187 Turkheimer E, Yeo RA, Jones C, Bigler ED.
mining for neuroscience databases. In: 29th Quantitative assessment of covariation
Annual Meeting of Society for Neuroscience. between neuropsychological function and
Miami Beach, FL, 1999. location of naturally occurring lesions in
179 Faloutsos C, Lin K-I. FastMap: a fast humans. Journal of Clinical and Experimental
algorithm for indexing, data mining and Neuropsychology 1990; 12: 549–65.

Appendix A1: application of t-test in SPM

The t-test compares two groups of samples and determines whether there is a
significant difference; here the groups are MR signal readings during two different
states (conditions): one R state (rest or control) and one T state (test or task). Let xyz
denote a voxel (volume element) of a 3-D image and let N be the total number of
voxels. Then the 3-D images [rxyz;R ], [rxyz;T ] exist for each subject k. Let [rxyz;k ] where
 rxyz;k ¼ rxyz;k;Tÿ rxyz;k;R be the voxel-by-voxel subtraction picture of the differences in
392 V Megalooikonomou et al.

r between R and T. If all rxyz representing the brain in anatomically standardized


pictures can be regarded as normally distributed, it is possible to calculate a
descriptive t-test:
Erxyz
trxyz ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Varrxyz
n
where the Erxyz and Varrxyz are the mean and variance of rxyz , respectively:
1Xn
Erxyz ¼ rxyz;k;T
n k

and
1 Xn ÿ 2
Varrxyz ¼ rxyz ÿ Erxyz
nÿ1 k

Appendix A2: use of the deformation function in analysis of morpho-


logical variability

A deformation function dðu; vÞ, is defined at each point ðu; vÞ of the atlas structure, S,
of interest. It measures the enlargement or shrinkage associated with the transforma-
tion from an infinitesimal region around a point in the atlas space to its corresponding
infinitesimal region in the subject space. If the point ðu; vÞ of the atlas space is mapped
to the point [Uðu; vÞ; V ðu; vÞ] in the subject space, then dðu; vÞ is defined by
dðu; vÞ ¼ detfr½Uðu; vÞ; V ðu; vފg, where r denotes the gradient of a vector function
and detðÞ denotes the determinant of a matrix. Intersubject comparisons are made by
comparing deformation functions. Specifically, let [U1 ðu; vÞ], ½V1 ðu; vފ; . . . ;
½UN ðu; vÞ; VN ðu; vފ be the maps from the structure S of the atlas to the structure S
of each of the N subjects of a population. Let also Up ðu; vÞ and Vp ðu; vÞ be the average
of the N functions U1 ; . . . ; UN and V1 ; . . . ; VN , respectively:
X X
Up ðu; vÞ ¼ 1=N Ui ðu; vÞ; Vp ðu; vÞ ¼ 1=N Vi ðu; vÞ
I¼1;...;N I¼1;...;N

The average structure Sp of that population is defined as the collection of the points
where the atlas structure points are mapped:
Sp ¼ [ðu;vÞ2Sa ½Up ðu; vÞ; Vp ðu; vފ
where Sa is the collection of points belonging to structure S of the atlas. Let p ðu; vÞ be
the point-wise mean of the deformation function of the population:
X
p ðu; vÞ ¼ 1=N di ðu; vÞ
I¼1;...;N
Data mining in brain imaging 393

where d1 ðu; vÞ; . . . ; dN ðu; vÞ are the deformation functions of the N subjects. Then the
difference between the two populations, denoted with subscripts 1 and 2, can be
measured for each structural region as an effect size180 defined as
eðu; vÞ ¼ ðp1 ðu; vÞ ÿ p2 ðu; vÞÞ=ðu; vÞ
where ðu; vÞ is the point-wise standard deviation of the two populations combined.

Appendix A3: generation of a Bayesian network from a database

Without loss of generality, we assume in this section that each variable is Boolean, i.e.
it represents a logical statement that is either true (T) or false (F). The problem of
generating the Bayesian-network structure that is most likely to have generated the
cases in the database D can be restated as
BSmax ¼ argmax PðBS jDÞ
Bs

where BSmax is the network structure (i.e. set of associations) we seek. Using Bayes’
theorem, we obtain
PðDj BS ÞPðBS Þ
BSmax ¼ argmax
BS PðDÞ
Since the prior probability of observing the data is constant for all models, the
problem reduces to solving
BSmax ¼ argmax PðDj BS ÞPðBS Þ
BS

Here, we describe an approach for generating a Bayesian network from a database,


based on the above equation.143 Although we will discuss the application of this
method to a lesion-deficit study the method can be applied to the more general
problem of finding associations. Let D be a database of lesion-deficit cases, let Z be the
set of discrete variables represented by D, and let BS represent an arbitrary Bayesian-
network structure containing just the variables in Z. In this section, we shall write as
though database D were generated by Monte Carlo sampling of a Bayesian network
with structure BS that is hidden from us, or, equivalently, from a multivariate
distribution with conditional-independence among variables determined by BS . The
primary goal is to use D to discover BS . The assumptions that explicitly delineate the
problem are: (1) the process that generated D is modelled as a Bayesian network
containing just the variables in Z; (2) cases occur independently, given a Bayesian-
network model; (3) cases are complete (not missing data); (4) the distributions f(BP |
BS ) are independent of each other; and (5) the second-order probabilities are uniform.
The application of assumption 1 yields
Z
PðBS ; DÞ ¼ PðDj BS ; BP Þf ðBP j BS ÞPðBS Þ d BP
BP
394 V Megalooikonomou et al.

where BP is a vector whose values denote the conditional-probability assignments


associated with Bayesian-network structure BS , and f is the second-order conditional-
probability density function over BP given BS . The integral is over all possible value
assignments to BP. Thus, the integration is over all possible Bayesian networks that
can have structure BS . The integral in the above equation is a multiple integral in
which the variables of integration are the conditional probabilities, BP, associated with
structure BS . Note that this formulation explicitly admits the concept of a prior-
probability distribution, P(BS ), over the possible Bayesian-network structures. One
can assume uniform priors over structures, or calculate these prior probabil-
ities.143,144,177
From these five assumptions Cooper and Herskovits derive an equation for
computing PðBS ; DÞ:
n Y
Y qi
ðri ÿ 1Þ! Y ri
PðBS ; DÞ ¼ PðBS Þ ÿ  Nijk !
i¼1 j¼1
Nij þ ri ÿ 1 ! k¼1

where n is the number of variables in D (or nodes in BS ), ri is the number of values


that node i can assume, qi is the number of different instantiations of the parents of
node i found in D, Nij is the number of cases in D with the parents of node i assuming
the jth value in the list i of their instantiations that are found in D, and Nijk is the
number of the Nij cases in D with node i assuming value k. This equation allows the
calculation of P(BS , D) using P(BS ) combined with enumeration over the cases in the
database. Maximizing P(BS , D) over all possible BS is equivalent to maximizing
P(BS |D) over all possible BS , which is the goal of this approach. Cooper and
Herskovits144 prove that this metric has polynomial computational complexity, and is
asymptotically optimal in terms of delineating all and only associations among the
variables in the underlying distribution. Because the number of possible associations
among the variables is more than exponential in the number of variables under
consideration, it is critical that this metric be computationally efficient, and that
heuristic search (e.g. simulated annealing or greedy search) be used.

You might also like