/  15
 
Neuron
 Article
 Visual Image Reconstruction from HumanBrain Activity using a Combination ofMultiscale Local Image Decoders
 Yoichi Miyawaki,
1,2,6
Hajime Uchida,
2,3,6
Okito Yamashita,
2
Masa-aki Sato,
2
 Yusuke Morito,
Hiroki C. Tanabe,
4,5
Norihiro Sadato,
4,5
and Yukiyasu Kamitani
2,3,
*
1
National Institute of Information and Communications Technology, Kyoto, Japan
2
 ATR Computational Neuroscience Laboratories, Kyoto, Japan
3
Nara Institute of Science and Technology, Nara, Japan
4
The Graduate University for Advanced Studies, Kanagawa, Japan
5
National Institute for Physiological Sciences, Aichi, Japan
6
These authors contributed equally to this work*Correspondence:kmtn@atr.jpDOI 10.1016/j.neuron.2008.11.004
SUMMARY 
Perceptualexperienceconsistsofanenormousnum-ber of possible states. Previous fMRI studies havepredicted a perceptual state by classifying brainactivity into prespecified categories. Constraint-freevisual image reconstruction is more challenging, asitisimpracticaltospecifybrainactivityforallpossibleimages.Inthisstudy,wereconstructedvisualimagesby combining local image bases of multiple scales,whose contrasts were independently decoded fromfMRI activitybyautomaticallyselecting relevant vox-els and exploiting their correlated patterns. Binary-contrast,10
3
10-patchimages(2
100
possiblestates)were accurately reconstructed without any imageprior on a single trial or volume basis by measuringbrain activity only for several hundred randomimages. Reconstruction was also used to identifythe presented image among millions of candidates.Theresultssuggestthatourapproachprovidesanef-fective means to read out complex perceptual statesfrom brain activity while discovering informationrepresentation in multivoxel patterns.
INTRODUCTION
Objectiveassessmentofperceptualexperienceintermsofbrainactivity represents a major challenge in neuroscience. PreviousfMRIstudieshaveshownthatvisualfeatures,suchasorientationandmotiondirection( KamitaniandTong,2005,2006 ),andvisualobject categories ( Cox and Savoy, 2003; Haxby et al., 2001 ) canbe decoded from fMRI activity patterns by a statistical ‘‘de-coder,’which learns the mapping between a brain activitypatternandastimuluscategoryfromatrainingdataset.Further-more,aprimitiveformof‘‘mind-reading’hasbeendemonstratedbypredictingasubjectivestateunderthepresentationofanam-biguous stimulus using a decoder trained with unambiguousstimuli Kamitani and Tong 2005, 2006; Haynes and Rees,2005 ). However, such a simple classification approach is insuffi-cient to capture the complexity of perceptual experience, sinceourperceptionconsistsofnumerouspossiblestates,anditisim-practicaltomeasurebrainactivityforallthestates.ArecentstudyKay et al., 2008 ) has demonstrated that a presented image canbe identified among a large number of candidate images usinga receptive field model that predicts fMRI activity for visualimages (see alsoMitchell et al., 2008, for a related approach).Buttheimageidentificationwasstillconstrainedbythecandidateimageset.Evenmorechallengingisvisualimagereconstruction,which decodes visual perception into an image, free from theconstraint of categories (seeStanley et al., 1999, for reconstruc-tion using LGN spikes). A possible approach is to utilize the retinotopy in the earlyvisual cortex. The retinotopy associates the specific visual fieldlocationtotheactivecorticallocation,orvoxel,providingamap-ping fromthevisual fieldtothecorticalvoxels ( Engeletal.,1994;Sereno et al., 1995 ). Thus, one may predict local contrast infor-mation by monitoring the fMRI signals corresponding to theretinotopy map of the target visual field location. The retinotopycanbefurtherelaboratedusingavoxelreceptive-fieldmodel.Byinverting the receptive-field model, a presented image can beinferred given the brain activity consistent with the retinotopyThirion et al., 2006 ).However, it may not be optimal to use the retinotopy or theinverse of the receptive field model to predict local contrast inan image. These methods are based on the model of individualvoxel responses given a visual stimulus, and multivoxel patternsarenottakenintoaccountforthepredictionofthevisualstimulus.Recentstudieshavedemonstratedtheimportanceoftheactivitypattern, in particular the correlation among neurons or corticallocations in the decoding of a stimulus (  Averbeck et al., 2006;Chen et al., 2006 ). Since even a localized small visual stimuluselicitsspatiallyspreadactivityovermultiplecorticalvoxelsEngeletal.,1997;Shmueletal.,2007 ),multivoxelpatternsmaycontaininformation useful for predicting the presented stimulus.In addition, a visual image is thought to be represented atmultiple spatial scales in the visual cortex, which may serve to
Neuron
60
, 915–929, December 11, 2008
ª
2008 Elsevier Inc.
915
 
retain the visual sensitivity to fine-to-coarse patterns at a singlevisual field location Campbell and Robson, 1968; De Valoiset al., 1982 ). The conventional retinotopy, by contrast, doesnot imply such multiscale representation, as it simply posits alocation-to-location mapping. It may be possible to extract mul-tiscale information from fMRI signals and use it to achieve betterreconstruction.Here, we present an approach to visual image reconstructionusing multivoxel patterns of fMRI signals and multiscale visualrepresentation ( Figure 1 A). We assume that an image is repre-sented by a linear combination of local image elements of multi-ple scales (colored rectangles). The stimulus state at each localelement (C
i
, C
 j
,
.
 ) is predicted by a decoder using multivoxelpatterns (weight set for each decoder, w
i
, w
 j
,
.
 ), and thenthe outputs of all the local decoders are combined in a statisti-cally optimal way (combination coefficient,
l
i
,
l
 j
,
.
 ) to recon-struct the presented image. As each local element has fewerpossible states than the entire image, the training of localdecoders requires only a small number of training samples.Hence, each local decoder serves as a ‘‘module’’ for a simpleimage component, and the combination of the modular de-coders allows us to represent numerous variations of compleximages. The decoder uses all the voxels from the early visualareas as the input, while automatically pruning irrelevant voxels.Thus, the decoder is not explicitly informed about the retinotopymapping.We applied this approach to the reconstruction of contrast-definedimagesconsistingof10
3
10binarypatchesFigure1B).We show that once our model is trained with several hundredrandom images, it can accurately reconstruct arbitrary images(2
100
possibleimages),includinggeometricandalphabetshapes,on a single trial (6s/12 s block) or volume (2 s) basis, without anypriorinformationabouttheimage.Thereconstructionaccuracyisquantified by image identification performance, revealing theability to identify the presented image among a set of millions of candidateimages.Analysesprovideevidencethatthemultivoxelpattern decoder, which exploits voxel correlations especially inV1, and the multiscale reconstruction model both significantlycontribute to the high quality of reconstruction.
RESULTS
In the present study, we attempted to reconstruct visual imagesdefined bybinary contrast patterns consisting of 10
3
10squarepatches ( Figure 1 ). Given fMRI signals
r
, we modeled a recon-struction image
^
I
 x 
j
r
 ) by a linear combination of local imagebases (elements)
f
 m
 x 
 b
I
ð
 x 
j
r
Þ
=
X
 m
l
 m
C
 m
ð
r
Þ
f
 m
ð
 x 
Þ
;
where
represents a spatial position in the image,
C
 m
r
 ) is thecontrast of each local image basis predicted from fMRI signals,and
l
 m
is the combination coefficient of each local image basis.The local image bases,
f
 m
 x 
 ), were prefixed such that they re-dundantly covered the whole image with multiple spatial scales.We used local image bases of four scales: 1
3
1, 1
3
2, 2
3
1,and 2
3
2 patch areas. They were placed at every location inthe image with overlaps. Thus,
f
 m
 x 
 ) served as overcompletebasis functions, the number of which was larger than that of allthepatches. Although imageelementslargerthan2
3
2orthosewith nonrectangular shapes could be used, the addition of suchelements did not improve the reconstruction performance.For each local image basis, we trained a ‘‘local decoder’’ thatpredicted the corresponding contrast using a linearly weightedsum of fMRI signals. The weights of voxels,
w
, were optimizedusing a statistical learning algorithm described inExperimentalProcedures(‘‘sparse logistic regression,’Yamashita et al.,2008 ) to best predict the contrast of the local image elementwithatrainingdataset.Notethatouralgorithmautomaticallyse-lected the relevant voxels for decoding without explicit informa-tionabouttheretinotopymappingmeasuredbytheconventionalmethod.The combination coefficient,
l
 m
, was optimized to minimizethe errors between presented and reconstructed images ina training data set. This coefficient was necessary becausethe local image bases were overcomplete and not independentof each other. Trained local decoders and their combinationcoefficients constituted a reconstruction model.fMRIsignalsweremeasuredwhilesubjectsviewedasequenceof visual images consisting of binary contrast patches on a 10
3
10 grid. In the ‘‘random image session,’’ a random pattern waspresentedfor6sfollowed bya6srestperiod Figure1B).Atotalof 440 different random images were shown (each presentedonce).Inthe‘‘figureimagesession,animageformingageomet-ric or alphabet shape was presented for 12 s followed by a 12 srestperiod.Fivealphabetlettersandfivegeometricshapeswereshown six or eight times. We used fMRI signals from areas V1and V2 for the analysis (unless otherwise stated). The datafrom the random image session were analyzed by a cross-validation procedure for quantitative evaluation. They werealso used to train a model to reconstruct the images presentedin the figure image session.
Reconstructed Visual Images
Reconstructed images from all trials of the figure image sessionare illustrated inFigure 2 A. They were reconstructed using themodel trained with all data from the random image session.Reconstruction was performed on single-trial, block-averageddata (average of 12 s or six-volume fMRI signals). Note that nopostprocessing was applied. Even though the geometric andalphabetshapeswerenotusedforthetrainingofthereconstruc-tionmodel,thereconstructedimagesrevealessentialfeaturesof the original shapes. The spatial correlation between the pre-sented and reconstructed images was 0.68 ± 0.16 (mean ± s.d.)for subject S1 and 0.62 ± 0.09 for S2.We also found that reconstruction was possible even from 2 ssingle-volume data without block averaging ( Figure 2B). Theresultsshowthetemporalevolutionofvolume-by-volumerecon-structionincludingtherestperiods.Allreconstructionsequencesare presented inMovie S1.
Image Identification via Reconstruction
To further quantify reconstruction performance, we conductedimage identification analysis Kay et al., 2008; Thirion et al.,2006 ) in which the presented image was identified among a
Neuron
Visual Image Reconstruction from Human fMRI
916
Neuron
60
, 915–929, December 11, 2008
ª
2008 Elsevier Inc.
 
number of candidate images using an fMRI activity patternFigure 3 A). We generated a candidate image set consisting of animagepresentedintherandomimagesessionandaspecifiednumber of random images selected from 2
100
possible images(combinations of 10
3
10 binary contrasts). Given an fMRI activ-ity pattern, image identification was performed by selecting theimage with the smallest mean square difference from the recon-structed image. The rate of correct identification was calculatedfor a varied number of candidate random images.In both subjects, image identification performance was farabovethechancelevel,evenuptoanimagesetsizeof10millionFigure 3B). The identification performance can be further
Reconstructedcontrast pattern
+++
1 x 11 x 22 x 12 x 2
λ
i
λ
l
λ
k
λ
 j
x
Time
BA
w
i
w
 j
w
k
w
l
fMRI signals
Presented image(10 x 10 patches)
C
i
C
 j
C
k
C
l
xxxxxxx
Local image bases(elements)Multi-scaleimagerepresentation
Multi-voxelpatterndecoders
Figure 1. Visual Image Reconstruction
(A) Reconstruction procedure. fMRI activity is measured while a contrast-defined 10
3
10 patch image is presented. ‘‘Local decoders’’ use linearly weightedmultivoxel fMRI signals (voxel weights, w
i
, w
 j
,
.
 ) to predict the contrasts (contrast values, C
i
, C
 j
,
.
 ) of ‘‘local image bases’’ (or elements) of multiple scales(1
3
1, 1
3
2, 2
3
1, and 2
3
2 patch areas, defined by colored rectangles). Local image bases are multiplied by the predicted contrasts and linearly combinedusing ‘‘combination coefficients’’ ( 
l
i
,
l
 j
,
.
 ) to reconstruct the image. Contrast patterns of the reconstructed images are depicted by a gray scale. Image basesof the same scale (except the 1
3
1 scale) partially overlapped with each other, though the figure displays only nonoverlapping bases for the purpose of illustration.(B)Sequenceofvisualstimuli.Stimulus images werecomposedof10
3
10checkerboardpatchesflickeringat6Hz(patchsize,1.15
3
1.15
;spatial frequency,1.74 cycle/ 
; contrast, 60%). Checkerboard patches constituted random, geometric, or alphabet-letter patterns. Eachstimulus block was 6 s (random image) or12 s (geometric or alphabet shapes) long followed by a rest period (6 or 12 s).
Neuron
Visual Image Reconstruction from Human fMRI
Neuron
60
, 915–929, December 11, 2008
ª
2008 Elsevier Inc.
917

Share & Embed

More from this user

Recent Readcasters

Add a Comment

Characters: ...