You are on page 1of 154


Daniel M. Drucker

A DISSERTATION in Psychology

Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy


___________________________________ Geoffrey K. Aguirre, Dissertation Supervisor

___________________________________ Michael J. Kahana, Graduate Group Chairperson

Chapter 2 was previously published in Cerebral Cortex under doi:10.1093/cercor/bhn244 and is Copyright 2009 Daniel M. Drucker and Geoffrey K. Aguirre, used by permission in accordance with the rules of the journal.

Chapter 3 was previously published in The Journal of Neurophysiology under doi:10.1152/jn.91306.2008 and is Copyright 2009 The American Physiological Society, used by permission in accordance with the rules of the journal.

For Sarah, who always believed this day would come.

Je ne peins pas les choses, je ne peins que les diffrences entre les choses. Henri Matisse iii

in order of appearance Thanks to My mother, Dede Drucker. My education has been many years in the making, spanning an absurd number of detours and false starts, and she has supported me, financially and emotionally, every step of the way. I am indebted to her for tolerating me with love and stoicism for so long.

My father, David Drucker, for encouraging my early interest in science and computing, and for his unfailing good humor.

Jon Mischo, who is solely responsible for one of the most fruitful of the aforementioned educational detours: he introduced me to Simon's Rock College.

Slavko Milekic, who first showed me that psychology, as a field, could be rigorous and useful.

Amy Janes, my oldest friend, who has served alternately as my conscience, taskmaster, and inspiration. Without her support I would not have found the courage or motivation to attempt graduate school.


Irene Khavin (ne Karel), my soul-sister and confidant, for seeing the best in me and cheerfully tolerating the worst.

Ben Karel, who absorbed everything I could teach him, and challenged me in return. The student has become the master.

The denizens of #mefi, for invaluable philosophical, moral, and anatomical discussion.

Sharon Thompson-Schill, who somehow saw promise in me despite all historical evidence to the contrary. I am forever grateful to her for giving me this incredible opportunity.

Sarah Drucker (ne Johnstone), my wife, best friend, teacher, editor, chef de cuisine, ... too many roles to list. Without her love, none of this would mean anything.

Geoffrey K. Aguirre, my advisor, mentor, and friend. He has an uncanny ability to see the sparks in my dullest ideas, and help me see them too. His enthusiasm and intellectual rigor are contagious, and his help and support have been unwavering and superhuman.

The rest of the Aguirre lab: Alison Harris, Wesley Kerr, and Amy Thomas, who made life in the lab not just tolerable but enjoyable, in spite of the whales trapped in the ducts. Their commentary has improved both my work and my sense of humor.

David Brainard, and Dan Swingley, my committee, who have consistently held my thinking and my work to a higher standard, improving both tremendously thereby.

My work has been supported by a NIMH predoctoral fellowship, Behavioral and Cognitive Neuroscience Training Grant (T. Abel, PI, T32-MH 017168).



A central focus of cognitive neuroscience is identification of the neural codes that represent stimulus dimensions. This dissertation investigates the relationship between stimulus similarity for several sets of parameterized shapes and the evoked patterns of activity in the brain. First, it is shown that patterns of neural activity associated with these parameterized shapes at both focal and distributed scales in object-responsive regions of cortex are highly isomorphic with behavioral measures of their stimulus similarity. Lateral and ventral portions of the lateral occipital complex are found to have differences in their tuning and the spatial scales of patterns of neural representation, based on the observed results of within-voxel adaptation and across-voxel distributed pattern analyses. Second, a new methodology is developed to distinguish between independent and conjoint neural representation of dimensions by examining the metric of two-dimension additivity. The assumptions of the method are examined as are optimizations. Finally, it is shown that the method produces the expected result for fMRI data collected from ventral occipito-temporal cortex while subjects viewed sets of shapes predicted to be represented by conjoint or independent neural tuning.


COPYRIGHT NOTICE............................................................................................................ ii ACKNOWLEDGEMENTS ..................................................................................................... iv ABSTRACT ............................................................................................................................. vii TABLE OF CONTENTS ....................................................................................................... viii LIST OF TABLES..................................................................................................................... x LIST OF FIGURES.................................................................................................................. xi

Chapter 1: Introduction ................................................................................................... 1 Chapter 2: Different spatial scales of shape similarity representation in lateral and ventral LOC..................................................................................................................... 14
Abstract .................................................................................................................................... 14 2.1 Introduction ....................................................................................................................... 14 2.2 Materials and Methods ..................................................................................................... 20 2.3 Results................................................................................................................................. 28 2.4 Discussion ........................................................................................................................... 40

Chapter 3: Distinguishing conjoint and independent neural tuning for stimulus features using fMRI adaptation..................................................................................... 47
Abstract .................................................................................................................................... 47 3.1 Introduction ....................................................................................................................... 47 3.2 Theory................................................................................................................................. 51 3.3 Example Experiment ......................................................................................................... 69 3.3.2 Materials and Methods ......................................................................................................................70


3.3.3 Results and Discussion ......................................................................................................................74 3.4 General Discussion ............................................................................................................ 77

Chapter 4: General Discussion ...................................................................................... 83 References........................................................................................................................ 93 Appendix A: Relationship to multi-voxel distributed pattern analysis ................... 107 Appendix B: Proof of additivity and sub-additivity of fMRI adaptation in neural populations with independent or conjoint tuning ...................................................... 109 Appendix C: Measurement of discretized recovery from adaptation...................... 112



Table 3-1: Average loading on the Euclidean contraction covariate, scaled by the loading on the City-block model covariates, for a simulation of independent and conjointly tuned neural populations............................................................................................................114


Figure 2-1: Stimuli...........................................................................................................116 Figure 2-2: Stimulus presentation....................................................................................117 Figure 2-3: Focal pattern similarity .................................................................................118 Figure 2-4: Distributed pattern similarity ........................................................................120 Figure 2-5: Similarity by stimulus axis............................................................................122 Figure 2-6: Voxel tuning and proximate vs. distant adaptation.......................................124 Figure 3-1: Conjoint and independent population codes .................................................126 Figure 3-2: Construction of covariates ............................................................................128 Figure 3-3: Simulation of non-linearities and distortions ................................................129 Figure 3-4: Effect of model rotation ................................................................................131 Figure 3-5: Effect of Minkowski index ...........................................................................133 Figure 3-6: The di-octagon space ....................................................................................134 Figure 3-7: Efficiency over sequence permutations ........................................................135 Figure 3-8: Stimuli...........................................................................................................136 Figure 3-9: Garner speeded-classification .......................................................................137 Figure 3-10: Stimulus presentation..................................................................................138 Figure 3-11: Neuroimaging results ..................................................................................139 Figure A-1: Comparison of within- and across-voxel tunings.........................................141 Figure C-1: Linearity of fMRI response ..........................................................................142


Chapter 1: Introduction
The input to the visual system is, at root, William Jamess blooming buzzing confusion. Each of more than a hundred million rods and cones provides constant varying input to the brain, whose task it is to reduce that massively multi-dimensional information into a form that allows the organism to extract principled regularities between similar inputs, so that they can be acted on appropriately. If this input were pure noise, this task would be impossible; fortunately, the world is mostly orderly, and we can perceive structure, and thus objects and other visual categories. The brain takes the changing pattern of light which in and of itself has no meaning and extracts perceptual dimensions from it. Some basic dimensions such as color, shape, size and orientation are computationally simple to derive; others, like object identity or even aesthetic beauty, may require more extensive processing. In this chapter, I first present an introduction to how these perceptual dimensions can be used to define similarity spaces, including behavioral methods that have historically been used to describe such spaces and the mathematical properties that can be used to make inferences about their representation. Subsequently, I discuss the motivations for relating these perceptual spaces to potentially isomorphic neural representations, a topic which forms the core of this dissertation.

Perceptual similarity spaces Any percept or stimulus in a set may be described by its of values along some collection of dimensions. For example, an apple might have some value of color, 1

roundness, size, and scores of other dimensions, while an orange would have differing values. Each stimulus thus exists as a point within this multi-dimensional perceptual space. Of course, natural images tend to be highly complex, varying on uncountable and even uncharacterizable dimensions. In order to study any system, it is generally desirable to simplify things such that one has control over the changing variables. In the case of vision, one way of effecting this simplification is by studying not natural images but synthetic ones. To examine how the brain treats various perceptual dimensions, the experimenter can create specific stimuli which vary in a controlled, parameterized fashion along dimensions chosen explicitly. By studying the behavior elicited by these stimuli, various properties of that perceptual space may be inferred. For any characterization method, the end goal is to specify the position of each stimulus in the space, either in absolute terms or (more commonly) by specifying all of the distances between stimuli as a 2-dimensional matrix. If people were perfectly rational and had perfect introspection, it would be sufficient to sit them down with a piece of paper and have them write down the coordinates in the multi-dimensional space for each item, or arrange items on a table so that their distances in real space match their internal idea of the space (Kriegeskorte et al., 2008). However, it is often not obvious what the dimensions or the metric of the desired space are; this may be in fact the very thing one wants to discover. Therefore, rather than measuring a particular stimuluss value on particular dimensions, it is often more feasible and useful to measure pairwise similarities. For example, if one specified a space of colors as a function of wavelength, the circular structure of hues as perceived by humans would not be apparent; yet this

structure would be readily seen in an analysis of pairwise distances (using multidimensional scaling, as discussed below). Much of this dissertation is concerned with the characterization of these multidimensional similarity spaces spaces in which the similarity (or dissimilarity) of two objects along some perceptual dimension is represented in terms of their proximity (or distance). This allows us to bring to bear the rich host of mathematical techniques that operate on and describe the properties of multi-dimensional spaces. For example, spaces may be described in terms of their metric along one or several dimensions, and inferences can be made about unmeasured points in the space: e.g., given what is known about the distances between measured points, what properties might a stimulus associated with a particular unmeasured point have, and what would be the distances between it and the existing points?

Similarity structure is representation A number of methods have been described for the characterization of these similarity spaces, beginning with Attneaves (1950) seminal work in the subject describing the concept of similarity as it applies to a set of objects or concepts. Attneave recognized that when two things are considered similar, they are considered so with respect to something; some dimension or property on which the two things are more like each other than like something else. For any set of objects, there will be any number of dimensions that can be asserted for these comparisons, and judgments of similarity will be different depending on which dimensions are being measured. It can be argued that it is in fact these similarity measures themselves that define the perception of an object. 3

Edelman (1998) pointed out that while nobody thinks that the representation of a cat in the brain is literally cat-shaped, or fluffy, many theories of shape representation have historically still been based on direct correspondence with properties in the world (e.g., Biederman, 1987). If one expects the visual system to do a good job at representing the world, that kind of implementation seems to fall short. Edelman proposed that a better method of representation is to encode relationships between distal stimuli, rather than to map them onto pre-existing proximal components.

Methods of characterizing perceptual similarity spaces Given that we wish to measure pairwise distances, how might we go about this? A simple way of obtaining similarity structure from subjects without defining axes is to present pairs of stimuli and ask for explicit similarity judgment on, e.g., a numerical scale. These pairwise responses can then be used directly to form the similarity matrix. While this method has the advantage of simplicity and transparency, it can be unreliable; we might often prefer a method with less explicit decision making on the part of the subject. One such method is the odd one out paradigm: the subject is presented with triads of stimuli, and asked to choose the two that go together better. Each time a pair is chosen, that pair's similarity score is incremented in a matrix. Since every item pair is presented in the context of every other item, context effects are balanced; and subjects are asked to make a trinary choice rather than choose from a continuum, so a scale doesnt have to be kept in mind. This method proves impractical as the number of trials required grows with factorial speed as set size increases. Further, while useful for items that are 4

well described by a collection of properties, this method is much less appropriate for stimuli that are better described by continuous variables. Finally, consider the case of five stimuli: apple, orange, grape, the author, and the authors cat. This method would correctly group the former three and latter two, but would poorly estimate the difference in between- and within-group distances. All of these measures thus far have depended on explicitly asking the subject to make some kind of similarity judgment between two clearly different stimuli. Another way to determine pairwise similarity is to assume that similarity is akin to confusability. Subjects are presented with pairs of stimuli; the task is to respond same or different. Average response time is considered to be the measure of dissimilarity. This sort of procedure works only if a sufficient number of stimuli are available, and those stimuli approximate something like a continuum; i.e., there must exist a range of values of confusability to measure, and there must exist items that are perceived as very similar but which are not identical. The time pressure creates a bias however; features that are more easily distinguished earlier in time are given more weight than features that might be equally as important (to some later part of the visual system) but which take more time to process. Further, a large number of catch identical trials are needed, which lengthen the experiment but provide no useful data. In spite of these limitations, response time (RT) measures tend to prove useful, as they provide a measure of similarity that is relatively independent of explicit decisional processes.

Computational methods of characterizing perceptual similarity spaces In multidimensional scaling (MDS) similarities or dissimilarities between pairs of items are treated as distances (Kruskal, 1964). The MDS algorithm then tries to find coordinates for all the items in an n-dimensional space that minimizes stress the difference between the distances given and those between the computed points. If n is greater than the number of points, stress will be zero. The goal of MDS is to reduce the dimensionality of the stimulus set; it is a mathematical technique for discovering the principal dimensions on which the items vary. Thus, typically one chooses a very small n: usually as small as 2 or 3 such that the points may be easily visualized. MDS is ideally suited for the elucidation of similarity spaces formed from stimuli that vary continuously in their perceptual properties; other methods, such as additive clustering (Sattath & Tversky, 1977; Shepard & Arabie, 1979), are more appropriate when stimuli are defined by category or discrete sets of properties. (The latter will not be discussed further here.) In studies like those described in Chapters 2 and 3 of this dissertation, it is useful not only to characterize the perceptual-space distribution of a particular stimulus set, but to modify stimulus parameters in order to create a stimulus space with specific properties (for example, to normalize the perceptual salience of two or more dimensions). The use of generated stimuli with adjustable parameters makes this possible: one iteratively collects RT data on stimulus pairs and adjusts parameters until the desired results are acquired (a very tedious process).

Behavioral methods of characterizing dimensional interactions Stimuli may vary along multiple perceptual dimensions. Some of these properties may be apprehended as individual, separate aspects, whereas other perceptual dimensions seem to be folded together into a composite and not separately appreciated. Tests have been devised as behavioral correlates to this impression, beginning with Garner and Felfoldy (1970), which defined two main types of dimensional interactions: separability and integrality. Under Garner and Felfoldys definition, changes along one separable dimension have no effect on the perception of changes along other separable dimensions; integrality was defined as the lack of this separability. Garner and Felfoldy described a method, the Garner speeded-classification task (also known as the Garner interference task), of characterizing a pair of dimensions as separable or integral. The Garner task measures the facility with which one dimension (e.g., color) is processed while another dimension (e.g., shape) of the same stimulus is ignored. Subjects are shown stimuli that vary on two dimensions, and are asked to classify stimuli into two categories. At the beginning of each block of trials, the subject is shown visually which dimension is to be used for sorting, and in what way to divide the dimension into two categories. A key press on each trial indicates the value on that dimension for the stimulus. In a filtering condition, the value of the unattended axis (which is not to be used for sorting) is varied randomly, while in a correlated condition, the value of the unattended axis is perfectly correlated with the attended axis. If the subjects reaction time is significantly better (i.e., reduced) in the correlated condition, the dimensions are considered to be integral, as this is evidence that the processing of the two dimensions are dependent to some degree on one another. If the two conditions are 7

not shown to be different, this indicates that the two can be processed independently; they are considered separable.

Computational methods of characterizing dimensional interactions Another way of describing the interaction of dimensions is by examining the metric of the similarity space. The metric of a space describes the distance between points in a given set. The most common in practice are the Euclidean and city-block metrics. The Euclidean metric is what one normally thinks of as distance: the length upon a ruler laid directly between two points. Distance between two points in the Euclidean metric is the Pythagorean distance: the square root of the sum of the squared differences along each dimension. By contrast, distance under the city-block (or taxicab metric) is simply the sum of those differences. These are two special cases of a more general Minkowski space (see Eq 3-1 in Chapter 3), where the exponent 1 corresponds to the city-block, 2 to the Euclidean metric. The Euclidean metric occupies a privileged position among all other values for the Minkowski exponent: under this metric, the computed distance is independent of the axes used to measure it. As an example, consider a ruler laid down on a sheet of graph paper. Under the Euclidean metric, the length of the ruler is constant: the orientation of the paper is irrelevant. Under all other Minkowski power metrics, the measured length is dependent on the angle between the ruler and the axes of the paper. The rulers city-block length will be longest when it is at a 45 angle with respect to the grid. If sufficient precision in the MDS-derived position of points in stimulus space is available, one can use the best-fit metric of that space to classify stimulus dimensions. 8

Recall that the MDS algorithm returns two things: a set of points in your specified ndimensional space, and a stress value. Many MDS algorithms also allow you to specify, in addition to the required n, the desired metric of the space: e.g., city-block or Euclidean. If distances conform best (have lower stress) to a city-block space, the dimensions are considered separable; if Euclidean, integral (Shepard, 1964; 1980) the city-block metric is one of independence of dimensions: the total change is simply the sum of the dimensional changes, with no interaction between them, while the Euclidean metric, being agnostic as to dimension, best represents stimuli which cannot be easily decomposed into separate dimensions. In practice, this method of characterization has generally been found impractical. It can also be ambiguous as to whether the difference in stress is best attributed to an improperly chosen distance metric or variance from a squashed dimension: for every Euclidean distance matrix, there is also a city-block matrix with the same values but a higher dimensionality (Borg & Groenen, 2005). A number of other methods for characterizing dimensional interactions have been developed, acknowledging that separable and integral are binary simplifications of a continuum of possibilities. For a detailed review of later advances in the description and measurement of dimensional interaction, see Kerr and Aguirre (forthcoming). Given these multiple ways to behaviorally and computationally characterize perceptual spaces and perceptual similarity, I turn in the next section to explore the possible neural underpinnings of these spaces.

Neural instantiation of perceptual similarity spaces Why might some kinds of dimensional stimuli be apprehended more readily than others? A proposed answer is that the processing of some kinds of visual properties is simply more directly implemented by the neural hardware. It seems unlikely that evolutionary pressures would have caused the visual system to develop a specialized mechanism for encoding, say, the Latin alphabet but colors, orientations, and other basic features of the world most certainly are implemented at a very low level. An important goal of visual neuroscience is to determine just which dimensions are in fact important to the brain i.e., what features of stimuli does the brain encode and process and where and in what manner is this done? Just as techniques such as multidimensional scaling are useful as applied to behavioral data in finding the principal ways in which relationships are organized, I propose that we can apply analogous methods to neural data in order to discover facts about cortical organization. One might further ask why certain sets of basic features may be apprehended in an integral fashion, while others may be processed separately. An explanation here could lie in the physical distribution of the neural codes for these features. Imagine that properties A and B are implemented by a single neural population, tuned conjointly to the two properties. It seems reasonable that this unitary population would, from an information-processing standpoint, be less efficient at delivering classification information to later stages in the visual stream about these properties independently than would two independent populations which each separately implement the detection of these properties. Cant et. al (2008) recently discussed the importance of relating measures such as the Garner speeded-classification task to the independence of observed fMRI 10

activations on a gross scale; I propose a method which allows these sorts of inferences to be made at smaller, within-voxel scales. In Chapter 3, a method will be discussed which infers the conjointness of neural populations representing stimulus dimensions. The method yields a value, the loading on a Euclidean term, which corresponds to the conjointness of the populations. One might consider that the loading on this term could vary parametrically with the degree of integrality, as measured behaviorally. In a recent series of experiments, Kerr et al. (forthcoming) showed exactly this sort of relationship. Textures that varied in spatial scale and identity were shown to subjects in an fMRI experiment such as that described in Chapter 3. In addition, behavioral data was collected from each fMRI participant, characterizing the degree of separability of the two dimensions for that individual participant. Kerr et al. found a strong linear relationship (R=0.8) between individual subjects measures of separability and the degree of independence of neural encoding (as measured by the Euclidean contraction covariate; see Chapter 3).

The present work The goal of the work presented in this dissertation is to examine whether behaviorally characterized perceptual similarity structures are directly instantiated as neural codes with precise, recoverable metric properties. Specifically, I will be concerned with the representations of simple object features in the lateral occipital complex, an object-responsive cortical region. In Chapter 2, I ask whether cortical responses to low-level shape features reflect perceptual similarity. I presented participants with simple object contours whose 11

perceptual properties had been previously behaviorally established and measured the fMRI adaptation in area LOC. In previous work (Kourtzi & Kanwisher, 2001), the ventral portion of LOC has been shown to adapt to object category, whereas a smaller adaptation effect has been seen in the lateral portion of LOC. I extend this work by demonstrating that not only object category, but very fine distinctions in object identity can be detected. In ventral LOC, I found that recovery from adaptation in the neural response was linearly proportional to psychological measures of stimulus similarity. In lateral LOC, I found that the distributed, across-voxel pattern of activity was not only sufficient to classify object identity with high accuracy but that the similarity structure across these patterns was well-correlated with the original perceptual space. Thus, I demonstrate that the neural codes for object representations can in fact relate quite directly to perceptual similarity. In Chapter 3, I ask whether even more subtle metric properties of similarity structure can be recovered from fMRI adaptation data, even in the face of various troublesome non-linearities. Specifically, I develop a method for distinguishing potentially conjoint from independent neural tunings for different perceptual features. As discussed above, when stimuli vary on two or more dimensions, one can ask whether these dimensions are orthogonally represented by two separate populations of neurons, each tuned preferentially to one of the stimulus dimensions, or whether the representation of the two dimensions are carried by a single population of neurons with a single set of tuning curves. I presented participants with two groups of stimuli (behaviorally-normed abstract objects) whose perceptual dimensions were likely, based on prior literature (Albright & Gross, 1990; Kayaert et al., 2005; Arguin & Saumier, 2000; Stankiewicz, 12

2008; Op de Beeck et al., 2003) to be represented by conjoint and independent neural populations. I also behaviorally characterized these groups as being perceptually integral and separable, respectively. Recovery from fMRI adaptation to the two dimensions was found to be subadditive only for the group of stimuli expected to be represented by a single neural population. I additionally demonstrate that the method developed is robust to a wide range of possible nonlinearities in the representational space or other confounds. Finally, I develop some optimizations as well as rigorous post-hoc tests of the method. These optimizations can greatly increase the sensitivity of many types of fMRI studies, and should be of interest to the broader neuroimaging community.


Chapter2: Different spatial scales of shape similarity representation in lateral and ventral LOC
We investigated the relationship between stimulus similarity for a set of parameterized shapes and the spatial scale of neural representation within sub-regions of the lateral occipital complex (LOC) using a carry-over fMRI design. In ventral but not lateral LOC, a linear recovery from adaptation proportional to shape dissimilarity was seen. In contrast, a strong correspondence of the distributed neural pattern and stimulus similarity was observed in lateral LOC but not ventral LOC. Further, ventral LOC voxels were found to be broadly tuned and represent all aspects of stimulus similarity, while lateral LOC voxels were narrowly tuned, and preferentially represented the shape of small features rather than their orientation within the shape. The results, indicating a coarse spatial coding of shape features in lateral LOC and a more focused coding of the entire shape space within ventral LOC, may be related to hierarchical models of object processing.

2.1 Introduction
A fundamental problem in vision is the need to reduce the myriad dimensions of input provided by the eye into an organized, structured representation (Edelman, 1998). Regularities in what we perceive allow the representation of a high-dimensional world by a lower-dimensional space; these representations are instantiated in the neural codes 14

maintained within cortical visual areas. Different stages of processing in the visual stream presumably represent objects in different ways. In this chapter we measure the representation and spatial organization of these codes for simple two-dimensional shapes, and what the relationship is between stimulus similarity and neural representation. The spatial scale of neural codes for objects has been a subject of debate over the last decade. Spatially small regions of cortex have been found which preferentially respond to particular categories of visual stimuli; for example, faces (Kanwisher, Mcdermott, & Chun, 1997; Mccarthy, Puce, Gore, & Allison, 1997), places (Epstein, Harris, Stanley, & Kanwisher, 1999), general objects (Haxby, Gobbini, Furey, Ishai, Schouten, & Pietrini, 2001; Haxby, 2006), and body parts (Kanwisher, Mcdermott, & Chun, 1997; Downing, Jiang, Shuman, & Kanwisher, 2001; Mccarthy, Puce, Gore, & Allison, 1997). These results imply a fine-scale, within-voxel representation of these stimuli, where small regions of cortex contain populations capable of representing the entire space of images in a category. Such a representation corresponds to Edelmans (1998) chorus of prototypes: sets of neurons that represent specific regions in a shape space, and, taken together, represent an entire space. A counter-point to this apparent specialization has been the demonstration that information regarding object category is also contained in the distributed pattern of voxel-wise responses across and between these specialized regions (Haxby, Gobbini, Furey, Ishai, Schouten, & Pietrini, 2001; O'Toole, Jiang, Abdi, & Haxby, 2005). These results, in contrast, demonstrate the existence of coarse-scale representations, where neurons in one region of cortex preferentially respond to one region in a representational 15

space, while neurons in another correspond to a different region. This type of representation might correspond to Edelman and Intrators (1997) chorus of fragments model, where individual properties of objects are represented by separate neural populations. Our focus here is upon the representation of variations in stimulus identity within a simplified object category. Within the domain of behavioral studies of object perception, this has been approached by relating the perceptual similarity of stimuli to the properties of their underlying mental spaces (Attneave, 1950; Garner & Felfoldy, 1970; Garner, 1974; Shepard, 1964; Shepard & Arabie, 1979; Sattath & Tversky, 1977). The relative perceptual similarity of a group of stimuli can be mapped to a representation that has metric properties, in that similar stimuli are closer together in an abstract, representational space. Aspects of the underlying representational space, such as its dimensionality and distortions, inform as to the nature of the representation. In practice, the structure of a parameterized space of shapes can be recovered from human behavioral responses (e.g., reaction times or similarity judgments) to pairs of those shapes, even when the subjects do not see the entire space in its veridical configuration. This isomorphism between perceptual and behavioral similarity may extend to the neural representation of variations in object appearance as well; that is, the similarity of neural activity patterns evoked by stimuli may map onto the perceptual similarity of the stimuli (Edelman, 1998). Op de Beeck and colleagues (2001) examined this possibility by recording responses of macaque infero-temporal (IT) neurons to 2D shapes. They found that the pattern of neural responses across IT neurons reflected the perceptual similarity 16

of the stimuli and was ordinally faithful to the veridical parametric configuration of the shapes, never seen by the subjects (Op de Beeck, Wagemans, & Vogels, 2001). Does a similar system of neural representation exist within human visual cortex? The human lateral occipital complex (LOC) shows similar functional properties to those previously ascribed to IT structures in the macaque. This region responds more strongly when a viewer is presented with images of parseable objects, as opposed to images that have no 2- or 3-dimensional interpretation, and appears largely indifferent to the method of object perception, e.g. objects may be defined by luminance, texture, motion, or stereo difference (Grill-Spector, Kushnir, Hendler, Edelman, Itzchak, & Malach, 1998). The LOC appears to be composed of two distinct, bilateral cortical areas: a more lateral region near the lateral occipital sulcus and a more ventral area near the posterior fusiform gyrus and occipital-temporal sulcus (Malach et al., 1995; Grill-Spector, Kushnir, Edelman, Avidan, Itzchak, & Malach, 1999). The ventral LOC is also referred to as the posterior fusiform sulcus (pFS). These lateral and ventral sub-divisions of LOC may have different functional properties, as suggested by the greater degree of neural adaptation to object identity that has been observed in the ventral region (Grill-Spector, Kushnir, Edelman, Avidan, Itzchak, & Malach, 1999; Kourtzi & Kanwisher, 2001). One significant goal in this study was the investigation of the representations hosted by these distinct areas of LOC. In the current study we applied the framework of stimulus similarity to investigate the neural representation of shape variation in human subjects, and the spatial scale on which that representation occurs. Two recent studies have demonstrated a relationship 17

between perceptual similarity and the distributed pattern of neural activity in LOC (Op de Beeck, 2008; Haushofer, 2008), both using synthetic novel shapes to dissociate these patterns from categorical, semantic representations. In the current study, we examine neural representation of perceptual similarity on both a distributed, as well as focal, cortical scale within two subregions of LOC. We used a carry-over fMRI design (Aguirre, 2007) in which the stimuli to be examined are presented in a counterbalanced, unbroken stream while the subject performs an orthogonal attention task. A continuous modulation of neural response proportional to stimulus similarity indicates the presence of a within-voxel population code using neural adaptation. Simultaneously, the distributed pattern of neural response evoked by each stimulus across voxels may be measured. This provides a measure of the similarity of neural representations for the stimulus set at two different spatial scales, as indexed by neural adaptation and distributed pattern analysis. During functional MRI scanning, subjects viewed 16 different shapes defined by radial frequency components (a series of sine waves of various frequencies describing perturbations from a circle; Zahn & Roskies, 1972; Figure 2-1). RFCs were at one time proposed as an organizing principle of shape recognition (Schwartz, Desimone, Albright, & Gross, 1983). Although this idea was later experimentally rejected (Albright & Gross, 1990), RFCs are nevertheless a useful method of creating and parameterizing shapes. Because they are simplified objects, these RFC curves provide an evenly parameterized similarity space, without semantically associated categorical boundaries.


These 2-dimensional, closed contours were varied parametrically by modifying the amplitude (amount of perturbation) and phase (positioning of perturbations) of one particular frequency component. Parametric variations of shape related to changes in amplitude and frequency of a low frequency component have been found to correspond to a two-dimensional representational space (as determined by a multidimensional scaling of similarity ratings), although changes in the phases alone of two low frequency components were found by Cortese and Dyre (1996) to collapse into a single dimension. The previous work of Cortese and Dyre defined the perceptual properties of these stimuli and demonstrated that human observers organize their perception of these stimuli around the two axes. Additionally, while these two axes were found to be of equal salience, one axis was found to perceptually correspond to essential shape features, while the other axis appeared to change the orientation of features within the shape. Does the similarity of the stimuli correspond to the similarity of the patterns of neural activity that they evoke? Neural adaptation (Grill-Spector & Malach, 2001; Henson & Rugg, 2003; Henson, 2003) measures the habituation a neural population experiences when a stimulus is repeated. We asked in this study if the degree of recovery from neural habituation at different cortical sites was proportional to the transition in similarity between two stimuli. Any voxels with this response property would indicate the presence of a neuronal population able to represent the space of shapes, in a manner analogous to that observed in macaque area IT by Op de Beeck and colleagues (2001). Such a representation would exist at a relatively fine spatial scale, with the population of neurons within a voxel sufficient to represent the shape space. We might expect ventral area LOC to contain this form of representation. 19

Additionally, recent studies have shown that the pattern of neural response to visual stimuli, distributed across voxels, contains information regarding the category of stimulus (Haxby, Gobbini, Furey, Ishai, Schouten, & Pietrini, 2001; O'Toole, Jiang, Abdi, & Haxby, 2005). In this study, we investigated if the distributed pattern of response can inform as to the identity of stimulus variation within an object category; results to this effect using man-made objects have recently been reported by Eger and colleagues (2008). As pointed out by Cox and Savoy (2003), the identification of such distributed patterns, which depend on between-voxel differences, indicates a given perceptual feature must be represented at a relatively coarse scale. We further wished to determine whether the relative similarity of these distributed neural patterns in turn reflects the perceptual similarity of the stimuli. Finally, we asked if the different perceptual features of the two stimulus axes are reflected in differences in neural coding at either a focal or distributed scale.

2.2 Materials and Methods

Subjects and scanning parameters Five right handed women aged 20-22 participated in the study. All subjects provided informed consent and the study conformed to the guidelines of the University of Pennsylvania Institutional Review Board. Structural and functional data were collected on a 3.0-T Siemens Trio scanner using an 8-channel head coil. High-resolution T1weighted structure images were collected in 160 axial slices and near isotropic voxels 20

(0.9766 mm 0.9766 mm 1.0000 mm; TR = 1620 ms, TE = 3 ms, TI = 950 ms). Functional, blood-oxygenation-level-dependent (BOLD), echoplanar data were acquired in 3 mm isotropic voxels (TR = 3000 ms, TE = 30 ms). BOLD data were acquired in 42 axial slices, in an interleaved fashion with 64 64 in plane resolution. The functional data were collected in 5 runs of 159 TRs each. The first 6 s of each run consisted of dummy gradient and radio frequency pulses to allow for steady-state magnetization during which no stimuli were presented and no fMRI data were collected.

Stimuli and behavioral task Stimuli were sixteen, simple closed contours (Figure 2-1) constructed from RFCs similar to those used by Cortese and Dyre (1996). Specifically, the RFC-amplitude of frequency 6 was 0.25, 0.50, 0.75, or 1.00 radian, and the RFC-phase of frequency 6 was 0, 40, 80, or 120 degrees. The RFC-amplitude and RFC-phase values of frequencies 2 and 4 were held constant at 0.50 radians and 0 degrees. Each contour was drawn on a mean gray background in either red or purple (randomly selected upon each presentation). The presentation of only the shape outline allowed us to distinguish between the similarity of the stimuli in pixel-wise or retinotopic measures and the similarity of the contour implied by the outline. The stimuli were back-projected onto a screen viewed by the subject through a mirror mounted on the head coil, and subtended 5 5 of visual angle. Each stimulus was presented for 1400 ms, with a 100 ms ISI consisting of the mean gray background (Figure 2-2). The subject was instructed to indicate on each trial, by button press, whether the contour was drawn in red or purple. 21

The task was assigned solely for the purpose of requiring the subject to attend to every stimulus in the experiment, and was constructed so as to not involve an explicit judgment of any aspect of the stimuli that was of experimental interest. All subjects performed above 96% accuracy, and the mean accuracy was 98%, indicating that subjects were alert and monitoring the stimuli as they were presented. There was no effect on RT of stimulus or stimulus-previous similarity (p=0.32).

Stimulus sequence Each of the 16 different shapes was presented to each subject 85 times in a fully counterbalanced order. The order of stimulus presentation was determined by an n=17, type 1 index 1 sequence, a first-order counterbalanced ordering that arranges the stimuli in permuted blocks (Nonyane & Theobald, 2008). The full sequence was divided into five parts for scanning as described in (Aguirre, 2007). The labels 1-16 were assigned to the 16 stimuli and the 17th label indexed the presentation of a blank trial (gray screen with fixation cross), which had a duration of 3 seconds (Appendix A, Aguirre, 2007). This sequence provides for first-order counterbalancing of the stimuli, such that every image appeared in the sequence both before and after every other image, as well as before and after 3 seconds of a blank screen. A particular type 1 index 1 sequence was selected which maximized efficiency (Friston, Zarahn, Josephs, Henson, & Dale, 1999) for detection of adaptation effects proportional to stimulus similarity. This sequence was identified by brute-force search of several hundred thousand sequences (Appendix A,












Image pre-processing Off-line data analysis was performed using VoxBo ( and SPM2 ( software. Data were sinc interpolated in time to correct for the slice acquisition sequence, motion corrected with a six-parameter, least squares, rigid body realignment routine using the first functional image as a reference, and normalized in SPM2 to a standard template in Montreal Neurological Institute (MNI) space. Normalization maintained 3 mm isotropic voxels and used 4th degree B-spline interpolation. In the analysis of adaptation effects, the fMRI data were smoothed in space with a 3 3 3 voxel isotropic Gaussian kernel. In the SVMlight / distributed analysis, the data were left unsmoothed. For each dataset (the spatially smoothed and unsmoothed), the average power spectrum across voxels and across scans was obtained, and the power spectrum fit with a 1/frequency function (Zarahn, Aguirre, & D'Esposito, 1997). This model of intrinsic noise was used during regression analyses with the Modified General Linear Model (Worsley & Friston, 1995) to inform the estimation of intrinsic temporal autocorrelation. The results of group analyses were presented (using BrainVoyager; atop the MNI anatomical image that served as a template for spatial normalization. Regions of interest corresponded to early retinotopic visual areas (V1, V2, V3, hV4) and categorically organized areas (LOC dorsal and ventral, identified 23

by response to object > scrambled object) were defined from data obtained during separate scans using standard methods (Harris & Aguirre, 2008; Radoeva, Prasad, Brainard, & Aguirre, 2008). The ROI analyses reported here combined data from the left and right hemispheres with the exception of the ventral LOC ROI for which a difference between the left and right hemisphere responses was found.

Statistical analysis of adaptation effects In order to analyze the adaptation effects, we created a set of three covariates modeling the interstimulus distance along each axis in the shape space at each point in time (Aguirre, 2007). Two covariates modeled the distance along the axes RFCamplitude and RFC-phase respectively. We assumed a linear, 4 by 4 spacing of stimuli. In multidimensional scaling (MDS) analysis of behavioral data, Cortese and Dyre (1996) found their stimuli to be placed in a reasonable simulacrum of a linear grid; we replicated this study in 11 subjects who performed a similarity rating task for the stimuli. We found that all subjects reliably produced a grid-like arrangement of the stimuli. As the average distance matrix generated by this behavioral data correlates R=0.93 with a simple linear grid, we felt justified in using the simplified model; this also permitted decomposition of the adaptation response into the RFC-amplitude and RFC-phase components. Additional covariates, not of interest in this study, modeled the main effect of stimulus presentation as compared to the blank trials, the effect of exact repetition of stimulus identity, and the effect of a stimulus following a blank trial (see Aguirre, 2007 for details). Nuisance covariates, corresponding to the effects of global signal, motion, and the orthogonal 24

attention task were included in both this analysis and the analysis of distributed effects. In an additional analysis, the two covariates corresponding to RFC-amplitude and RFCphase were replaced with a set of six covariates that modeled the six possible sizes of city-block transitions in the shape space. Using these covariates, we examined the linear relationship between shape similarity and neural adaptation within the functionally defined sub-regions of LOC. Exploratory group results were also obtained for a whole-brain, random-effects analysis, and thresholded at a map-wise significance of alpha=0.05 as determined by a permutation test (Nichols & Holmes, 2002) (t > 3.5 with a cluster > 50 voxels). For the analysis within ROIs, the voxels in each ROI with the largest main effect (i.e., the contrast of all stimuli vs. blank) were selected and averaged. This was done to maintain parity across the ROIs and with the analysis of distributed patterns.

Statistical analysis of distributed effects In a separate analysis the distributed pattern of neural activity associated with each stimulus was obtained. Each stimulus appeared 85 times during the experiment. These 85 presentations were randomly assigned to one of five groups, with the constraint that an equal number of presentations were included from each scan (to avoid scan effects being a learnable factor in the subsequent analysis). A set of covariates modeled the identity of the stimulus being viewed for each group. These 80 covariates (16 stimuli 5 groups) modeled each stimulus presentation as a neural impulse, convolved with a standard hemodynamic response function (Aguirre, Zarahn, & D'Esposito, 1998). 25

A Support Vector Machine (SVM) was then used to classify the average brain activation map associated with each stimulus. For each voxel, we obtained the five average responses to a given shape over the groupings of its presentations. For all possible pairings of the stimuli, the linear SVM classifier (Joachims, 1999) was trained and tested in a leave-one-out manner using 4 of the 5 groups for each stimulus. (Our code for queuing SVM analyses and organizing the results For is each available voxel, at the

classifier exposes a value related to the amount of variation of that voxel between conditions: e.g., how useful that voxel was at discriminating between conditions, across all of the pairwise comparisons. This is called the w-value; the map of these across the brain is called the w-map. The w-map for each subject was z-transformed across voxels. A group w-map was created by smoothing (with a 3 voxel FWHM kernel) the w-map from each subject and then averaging across subjects. A further analysis examined the similarity of distributed patterns of neural activity associated with the 16 different stimuli within different ROIs. Within each of the studied ROIs, we selected the 50 most discriminatory voxels (i.e., those with the highest w-value as identified by SVM). The average vector of stimulus beta values across these voxels was then obtained, and subtracted from each voxel vector. A neural similarity matrix was constructed by calculating the Pearson correlation between the vector of beta values across voxels for each stimulus and every other stimulus. As the matrix is symmetric about the diagonal, only the lower triangle was retained, and the diagonal elements were excluded (as these have an obligatory value of unity and are thus uninformative). We then asked how well the neural similarity matrix in each region was correlated with the 26

stimulus similarity matrix as defined by Cortese and Dyre (1996) and with its decomposed elements (RFC-amplitude, RFC-phase).

Measurement of voxel tuning For each voxel in ventral and lateral LOC for each subject we identified the average amplitude of BOLD fMRI response to each of the 16 stimuli, expressed as a 4 4 matrix (the response profile). The number of voxels with maximal responses to particular stimuli defined a histogram of peak stimulus responses for each region of interest. To determine mean voxel tuning all the response profiles in a given ROI for a subject were averaged together after aligning each 4 by 4 matrix within a 7 by 7 matrix, such that the center cell held the maximum value. The response to each of the 15 nonpeak shapes was scaled as a proportion of the range prior to averaging. The region tuning functions were then averaged across subjects. The center value was omitted from plots as it had an obligatory value of unity. Finally, we examined the degree to which neural adaptation varied as a function of the tuning of a voxel. The stimulus which elicited the maximum response within the response profile of each voxel was identified. Separate covariates modeled the adaptation associated with transitions between stimuli that were adjacent in the stimulus space but either included (i.e., were proximate to) or excluded (i.e., were distant from) the stimulus identified as eliciting the largest average response. The average difference in the degree of adaptation elicited by proximate and distant stimulus transitions was obtained for a given region and subject, and the average across subjects then obtained. 27

2.3 Results

Continuous neural adaptation in ventral LOC is proportional to shape similarity We first measured the strength of the relationship between within-voxel neural adaptation and stimulus similarity. Consistent with previous work and our predictions, the largest and most significant effect was found within ventral area LOC on the right [t(4df)=10.5, p=0.0005], while the lateral component of area LOC showed essentially no consistent adaptation effect proportional to perceptual similarity [t(4df)=1.0, p=0.4]. The difference between these sub-regions of area LOC was significant [t(4df)=4.9, p=0.008] (Figure 2-3A). These effects were present in all five subjects and individually significant in four of the five (linear effect in ventral LOC for each subject: p=0.05, 0.01, 0.03, 0.09, and 0.04), as well as at the population level. The adaptation effect for exact repetition was not significant in either region of interest. This result is consistent with recent findings that infrequent, exact repetition is associated with an attenuated neural adaptation response (Summerfield, 2008). An exploratory whole-brain group analysis of these data (Figure 2-3B) supported the result found within the LOC regions of interest. Most prominent was a significant proportional adaptation effect seen in the right ventral posterior fusiform sulcus corresponding to ventral area LOC. No adaptation effects were seen within the area of the lateral LOC.


To confirm that the modulatory effect of stimulus context upon neural response was linearly related to the change in stimulus similarity, we obtained the average BOLD response to each stimulus as a function of the size of transition in the shape space from the prior stimulus. The steady increase in neural response seen in the right, ventral LOC (Figure 2-3C) is well fit by a linear function. In contrast, the lateral LOC did not evidence a systematic recovery from adaptation over the range of shape changes. An alternative explanation for the proportional recovery from adaptation in ventral LOC is that the extreme stimuli (those from the corners of the stimulus space) may evoke a larger neural response generally (e.g., Kayaert, Biederman, Op de Beeck and Vogels, 2005). As the larger distance stimulus transitions tend to include these extreme stimuli to a greater extent, perhaps the apparent recovery from adaptation is actually a larger response to these extreme stimuli independent of an adaptation effect. To evaluate this possibility, we again measured the degree of BOLD response to each stimulus as a function of the size of transition from the previous stimulus. Additionally, however, transitions which included one of the extreme, corner stimuli were modeled separately from those that did not include these stimuli. We confirmed that a linear recovery from adaptation was seen for transitions that either excluded [t(4df)=3.3, p=0.03] or included [t(4df)=3.74, p=0.02] the stimuli from the extremes of the stimulus space, and that the degree of recovery did not differ between these two sets [t(4df)=0.98, p=0.4]. Therefore, the proportional recovery from adaptation seen in ventral LOC indicates the presence of a population code for stimulus shape, and cannot be attributed to a generally greater neural response to extreme stimuli.


Distributed pattern responses distinguish between shapes We next used a support vector machine (SVM) classifier to analyze our data at a coarse spatial level. We first analyzed the data to identify the across-voxel pattern of activity evoked by each stimulus for each subject. As the order of stimulus presentation was counterbalanced, this measure of the average response across trials is independent of first-order context, and thus not influenced by short-term adaptation effects. The distributed activation pattern for each stimulus was then used in leave-one-out training of an SVM classifier to determine classification accuracy and the location of voxels that contributed to successful classification. The ability to classify the stimuli based upon the voxel-wise activity pattern constitutes evidence of a coarse scale of representation of shape, where different stimuli evoke patterns of activation that differ in their amplitude between voxels. Within each subject, the SVM analysis was successful at distinguishing between pairs of the sixteen stimuli using the whole-brain response maps. For each subject a decision matrix is generated containing the 120 values (hyperplane separation distances) associated with each pairwise decision, where this value is larger when the classifier is more certain of its ability to separate the pair of neural patterns. On average, in a given subject the support vector machine classifier was able to distinguish between the neural patterns elicited by a given pair of stimuli with 71% accuracy (sd 11%). When the decision matrices are summed (i.e., the classifier is permitted to use the hyperplane distances from all subjects in its decisions), the pairwise accuracy rises to 89%. 30

To identify which cortical areas most contributed to classification accuracy, we obtained the average (across pair-wise comparisons) of the discriminatory contribution (w-value) for each voxel for each subject. These w-maps were normalized to z-values across voxels, spatially smoothed, then averaged across subjects. The resulting map (Figure 2-4B) indicates the cortical location of voxels that tended, across subjects and across stimulus pairs, to be most informative regarding stimulus identity in the SVM analysis. Bilateral discriminant patches were found in the lateral component of LOC, although not in the ventral areas. Given their identification in this analysis, these lateral regions arguably contain a coarse neural code in which individual voxels have different levels of activation associated with the different stimuli, thus allowing the classifier to associate a specific stimulus with a specific neural activation pattern. Thus in these areas the success of the SVM classifier reveals the spatial nature of the neural population: a coarse, across-voxel pattern carries information about shape.

The similarity of distributed pattern responses reflects the similarity of shapes The accuracy of the SVM analysis and the identified patch within lateral LOC indicates that the distributed voxel pattern of activity in that area carries information about shape. However, the pattern difference between shapes need not reflect the similarity of the stimuli or indeed have any particular structure. The SVM requires only that patterns be different in order to distinguish them no assumptions about similarity


structure are made or used. We wished to test the further hypothesis that the similarity of the distributed neural pattern evoked by any pair of stimuli would reflect their similarity. Behavioral testing (Cortese & Dyre 1996) can be used to define the perceptual similarity matrix for a set of stimuli. The left panel of Figure 2-4C shows the similarity matrix for our stimuli (Figure 2-1). Each cell of this 16 16 matrix expresses (by color scale) the similarity of a pair of stimuli. The visible structure to the matrix follows directly from the arrangement of the stimuli within the perceptual space (Figure 2-1). We created a distributed neural similarity matrix for lateral and ventral LOC for each subject. The neural similarity matrix is constructed by assigning each cell of the matrix to the Pearson correlation of the across-voxel pattern of response (for the 50 most discriminant voxels within a region) associated with a particular stimulus pair. The entire matrix captures the pair-wise similarity structure of the entire set of stimuli, as instantiated in the neural responses they evoke. The correspondence between the measured neural similarity matrix for each region and the stimulus similarity matrix for the stimuli was then obtained. Within lateral LOC, the strongly discriminant responses seen in the SVM analysis were found to also reflect stimulus similarity consistently across subjects [t(4df)=10.0, p=0.001]. In contrast, the distributed pattern of response in ventral LOC had a weaker correlation with the perceptual similarity of the stimuli [t(4df)=1.2, p=0.3] (Figure 2-4A). The difference between these subregions of area LOC was significant [t(4df)=11.4, p=0.0003].


This regional difference is also visible in the average neural similarity matrix obtained across subjects from within the ventral and lateral LOC. Figure 2-4C demonstrates that the average neural similarity matrix from the lateral LOC has definite structure, and a strong correlation (R=0.65) with the stimulus similarity matrix. Notably, there are aspects of the structure of the neural similarity matrix that do not seem reflected in the stimulus matrix; the source of this difference is explored in the next section. The average neural similarity matrix for ventral LOC had a weaker correspondence to stimulus similarity (R=0.19). Using all voxels in each of these ROIs, rather than only the 50 most discriminant, produced similar results (lateral LOC R=0.64; ventral LOC R=0.21). Could the distributed pattern in lateral LOC be explained entirely by simple retinotopic organization? Because we used unfilled shape outlines, the pixel-wise similarity between shape pairs was quite low, and the correlation between the pixel-wise similarity of the shapes and the distributed neural pattern in lateral LOC was 0.18, far lower than the 0.65 correlation seen with the perceptual similarity of the shapes (Figure 2-4C). Another possibility is that the shape outlines are perceptually filled in, as has been observed to occur in neuronal responses (Lamme, Rodriguez-Rodriguez and Spekreijse, 1999). The correlation between the pixel-wise similarity of the contours filled and distance in the shape space is fairly high (R=0.85), making this explanation plausible. Other aspects of the results, however, render a solely retinotopic account incomplete. First, there is a weak correspondence between distributed neural and perceptual similarity in adjacent ventral cortical areas with much stronger retinotopic 33

organization (V2V3v R=0.23; hV4 R=0.24). Second, as will be discussed in the next section, differences in the strength of representation of the two stimulus axes (RFCamplitude and RFC-phase) cannot be explained on a retinotopic basis.

The RFC-amplitude and RFC-phase axes are differentially represented at coarse and fine neural scales While the distributed neural similarity matrix measured from lateral LOC was strongly correlated with the stimulus similarity matrix, there appeared to be aspects of the structure of the neural response not evident in the stimulus matrix (Figure 2-4C). We considered the possibility that this difference is explained by differences in the neural representation of two axes that define the stimulus space. Cortese and Dyre (1996) showed that vectors in their behavioral multidimensional scaling fitted for RFCamplitude and RFC-phase were approximately orthogonal and equal. This indicates that the two perceptual axes define the perceived shape space, and that each axis is equally perceptually salient. It is not the case, however, that the axes are perceived equivalently. Cortese and Dyre found that parametric changes along the RFC-amplitude axis were described by subjects as changes in the smoothness and complexity of the shape, while changes along the RFC-phase axis were described as a change in the orientation of the parts of the shape (Figure 2-5A). As these parametric changes were perceived differently, perhaps they are represented differentially within object sensitive cortex. To test this idea we examined the relationship between stimulus similarity along the RFC-


amplitude and RFC-phase dimensions and the focal and distributed neural similarity matrices decomposed along these axes within the subregions of LOC. Ventral LOC had demonstrated a continuous modulation of neural response that was proportional to stimulus similarity. We examined if the degree of this neural adaptation seen for changes in RFC-amplitude and RFC-phase differed. Both changes in the apparent complexity of the shapes as well as in the orientation of the shape features produced proportional recovery from adaptation in ventral LOC (Figure 2-5B). This indicates that both aspects of the stimulus space are represented by the within-voxel population code within ventral LOC. A rather different result was observed for the distributed pattern of response within lateral LOC. There, the distributed pattern across subjects reflected the shapes primarily in terms of RFC-amplitude, but not RFC-phase (Figure 2-5C). This difference is apparent when the average distributed neural similarity matrix obtained for the lateral LOC (initially presented in Figure 2-4C) is compared to the stimulus similarity matrix decomposed along the RFC-amplitude and RFC-phase axes (Figure 2-5D). The structure of distributed neural responses within lateral LOC strongly reflects the apparent shape of the stimulus indexed by RFC-amplitude (R=0.72), but has a weak representation of the orientation of shape features defined by RFC-phase (R=0.14). This finding indicates that the representation of shape for these RFC contours across voxels is independent of shape feature orientation defined by phase. For example, clusters of neurons might represent the tightness of the knobs of the shapes (defined by RFC-amplitude) independent of the direction that those knobs point within the overall shape (defined by RFC-phase). RFC35

amplitude and RFC-phase may be taken as similar to Op de Beecks (2008) feature and envelope parameters, respectively; we thus contribute a similar finding in that features are represented in the distributed pattern in lateral LOC much more reliably than the overall shape envelope.

Neural adaptation within lateral LOC is modulated by narrow tuning for shape Based upon the differential sensitivity to shape identity for the adaptation and distributed pattern methods, we argue that while both the lateral and ventral components of area LOC contain neural population codes for shape, the spatial scale of these representations differ. Specifically, the absence of a distributed pattern effect within ventral LOC is evidence for a homogeneous representation of the shape space, such that the average response of any one voxel does not differentiate between the shapes, while the presence of a distributed code and the absence of an adaptation effect in lateral LOC suggests that there is a heterogeneous distribution of shape representation, such that any one voxel tends to respond only to a limited area of the shape space. An objection to this account is that adaptation and distributed pattern methods might be expected to discriminate between the stimuli regardless of the underlying spatial distribution of the neural populations. For example, while the neurons responsive to particular line orientations are distributed within V1 on a scale smaller than individual voxels, multi-voxel pattern methods are nonetheless capable of recovering stimulus information from fMRI data (Kamitani & Tong, 2005), presumably because voxels retain some tuning to particular line orientations. Further, one might expect that even in the 36

presence of a coarse code for shape identity, that the small adaptation effects obtained in different voxels would, on average, produce a recovery from adaptation for a region that reflects stimulus similarity. Finally, the results so far cannot distinguish between the possibility that lateral LOC has clusters of neurons tuned to a particular location in the shape space and the possibility that clusters of neurons are tuned to particular parts of the shapes. To address these issues, we examined the tuning of individual voxels to the shape space and the effect that tuning had upon neural adaptation. First, our proposal that ventral and lateral LOC represent the studied spaces on different spatial scales would predict that voxels drawn from lateral LOC would have narrow tuning for particular shapes within the shape space, while voxels from ventral LOC would be broadly tuned. If ventral LOC voxels are so broadly tuned to shape identity that the amplitude of the BOLD response is roughly the same across shapes, then the relatively weak distributed representation within this region will be explained. For each voxel for each subject we obtained the response profile which measures the amplitude of the BOLD fMRI response to each stimulus (independent of adaptation effects). An example response profile for one voxel is shown in Figure 2-6A. For each voxel, one shape will have evoked the maximum observed response. Figure 2-6B presents the histogram of peak responses for ventral and lateral LOC across subjects. Stimuli from the extremes of the stimulus space tended to have a greater representation in maximal voxel responses [effect of stimulus upon proportion of voxels: F(15, 159 df) = 6.08, p<0.0001]. However, the distribution of maximally responsive voxels was not 37

different between ventral and lateral LOC [stimulus by region effect: F(15, 159 df)=1.07, p=.39]. The aspect of voxel response most relevant to a distributed pattern analysis, however, is the tuning of the response of the voxel across stimuli. We obtained the average tuning of voxels in each region of interest across subjects (Figure 2-6C). These plots show the decline in BOLD response for an average voxel for the presentation of shapes that are progressively more distant from the shape that evokes the greatest response for the voxel. Within ventral LOC, no meaningful tuning for the shape space can be identified: the amplitude of the response is no different for different shapes. This indicates that ventral LOC voxels are broadly tuned for shape identity. In contrast, lateral LOC voxels show relatively narrow tuning: there is a progressive decline in the response of a voxel for shapes more distant from the shape for which the voxel is best tuned (which was frequently a stimulus from the edges of the stimulus space). Moreover, lateral LOC voxels appear more narrowly tuned for the RFC-amplitude, as compared to the RFC-phase dimension of the shape space, consistent with our previous observation that the RFC-amplitude dimension is more strongly represented in the distributed pattern of response within this region. The differential tuning of voxels within lateral and ventral LOC to shape is a sufficient explanation for the differences in distributed pattern representation in the two regions. The narrow tuning observed in lateral LOC may also explain the absence of a linear adaptation response in this region to transitions in shape space. If a given voxel is narrowly tuned to a particular region of the shape space, then it may only show recovery from adaptation for stimulus transitions within its tuned area. The example voxel shown 38

previously (Figure 2-6A) is tuned to maximally respond to a particular stimulus, indicated in yellow. Some transitions between stimuli will occur within the center of the tuning for this voxel (Figure 2-6D, indicated in red), while some stimulus transitions of equal magnitude will occur distant from the tuned shape (indicated in black). For each voxel for each ROI, we measured the degree of neural adaptation associated with stimulus transitions of equal perceptual sizes that were located either proximate to or distant from the tuned center of the response profile for a voxel (Figure 2-6E). Within ventral LOC, the tuning of a voxel had little effect upon the degree of adaptation seen for stimulus transitions. This is in keeping with the broad tuning of the response of each voxel and is again consistent with the notion that voxels within ventral LOC contain populations of neurons capable of representing the entire shape space. In contrast, voxels in lateral LOC showed greater adaptation to small stimulus transitions when those transitions occurred proximate to the center of the tuning of voxel. The absence of a linear recovery from adaptation for the entire lateral LOC region can therefore be understood as a consequence of the restricted tuning of the neurons contained in any one voxel. Each lateral LOC voxel is only able to represent by adaptation small stimulus transitions that occur within its tuned area. A re-examination of the recovery from adaptation related to the magnitude of stimulus change for lateral LOC (Figure 2-3C, right panel) is informative in this regard. A roughly linear increase in recovery from adaptation is seen for small stimulus transitions (step size 4 or less), but not for larger stimulus transitions.


2.4 Discussion
It has been suggested (Edelman, 1998) that the neural representation of objects may be best characterized as representation of similarity, encoded by a chorus of prototypes of neural responses tuned to different regions of a shape space. In a 1998 study that anticipated the subsequent application of multi-voxel analysis methods in fMRI by several years, Edelman and his colleagues demonstrated that the pattern of voxel responses could be used to reconstruct the similarity representation of stimuli within and across object categories (Edelman, Grill-Spector, Kushnir, & Malach, 1998). In our current study we return to these ideas with the aim of examining the correspondence of perceptual and neural similarity in the representation of a set of simple shapes. This relationship has been explored at the across-voxel scale in two recent studies (Op de Beeck, 2008; Haushofer, 2008). By using a continuous carry-over design, our study was capable of examining neural similarity both on a coarse, across-voxel scale by distributed pattern analysis, as well as on a fine, within-voxel scale using continuous neural adaptation. We can thus compare the information provided at distributed and focal levels. We found that shape similarity is represented on both scales, although the cortical sites that carry information at these two scales differ. Our focus here was upon the components of the lateral occipital complex, a visual area with specific responses to formed visual objects. Within the ventral portion of LOC we identified neural adaptation proportional to shape similarity, suggesting the existence of a population code for shape distributed on a fine, within-voxel spatial scale. This result parallels that of Op de Beeck 40

and colleagues (2001) in their measurement of neural response similarity in macaque IT cortex to two-dimensional shape variation. The weaker distributed (across-voxel) similarity pattern for shape within ventral LOC indicates that each voxel responds roughly equally to each shape, suggesting that the within-voxel code in this region generally spans the entire shape space. It is of course possible that scanning with higher spatial resolution, or the use of other shape variations, perhaps with greater differences along the axes, would reveal heterogeneity in the focal population code across voxels and thus improve the performance of the distributed pattern measure. Indeed, this kind of heterogeneity has been demonstrated within ventral temporal cortex for objects of different categories, for which the pattern of across voxel responses is discriminative even outside of categorically defined cortical areas (Haxby, Gobbini, Furey, Ishai, Schouten, & Pietrini, 2001). A study which compared the recoverability of distributed pattern information for large and small ranges of shape variation would be one way to unify these findings. In contrast, the lateral component of area LOC demonstrated a coarse (acrossvoxel) pattern of response that reflected shape similarity, reflecting the relatively narrow tuning of individual voxels to the shape space. This parallels Op de Beecks (2008) finding of high correlation between similarity ratings and distributed neural patterns in area LO as compared with area PF. Several possible forms of coarse coding within lateral LOC could account for this result. Each voxel may contain neurons tuned to a particular sub-region of the shape space, resulting in voxels that demonstrate a receptive field for a region of the shape space coded by firing rate. A related coding scheme would have patches of lateral LOC cortex tuned to particular shape features, with some voxels (e.g.) 41

preferring tight curves and others concave line segments. It is also possible that simple retinotopic organization within lateral LOC is the basis of the distributed similarity found here, either alone or in combination with sensitivity to shape features. We regard it as unlikely, however, that retinotopic organization can alone explain this result, for the reasons discussed earlier. Unlike ventral LOC, the lateral portion of LOC did not show adaptation responses that were linearly related to shape similarity. We found that the narrow tuning of lateral LOC voxels could explain this finding, indicating that each particular voxel has a population of neurons that are tuned to one specific region of the shape space. Consequently, most of the transitions between stimuli would not induce neural adaptation within the voxel, as they would be transitions between stimuli not within the voxels receptive field. The presence of a coarse spatial code for shape within lateral LOC and a finescale code within ventral LOC suggests a processing hierarchy in which cortical patches within lateral LOC tend to represent features, while population codes in ventral LOC represent entire integrated shapes. This coarse lateral LOC representation would correspond to the chorus of fragments model (Edelman & Intrator, 2000) in which representations of features are combined with a representation of fragment orientation or retinotopy to capture shape information. In support of this model, we found that the distributed code for shape identity within lateral LOC reflected changes along the RFCamplitude axis, which perceptually corresponded to changes in features, but not changes along the RFC-phase dimension, which corresponded to the orientation of features. Fragment or feature orientation may be implicitly represented in the low-resolution retinotopy that is present in lateral LOC (Dumoulin & Wandell, 2008), in other visual 42

areas, or in an aspect of lateral LOC response that is below the resolving power of our method. In any case, the form of the RFC-phase representation does not reflect shape similarity on a coarse scale. Within ventral LOC, shape changes along both the RFC-amplitude and RFCphase dimensions were strongly represented in the within-voxel population code, as indexed by continuous neural adaptation. The representation of shape in this ventral region is thus arguably more integrated, no longer representing individual features but instead the entire shape. This is not to say that a distributed pattern could not, in principle, be recovered from ventral LOC, given, e.g., better resolving power, smaller voxels, or less spatial smoothing by the hemodynamic response function. Indeed, a modest correlation (R=0.21) was found for the distributed pattern in ventral LOC in our study. However, our results demonstrate at least a difference between the scales of a neural pattern in the two regions of interest. Our study differs from prior studies of the neural correlates of shape adaptation (e.g. Kourtzi, 2001) in the use of a continuous adaptation design. Instead of paired presentations of stimuli which were either identical or different, we presented a continuous stream of stimuli. While a linear recovery from adaptation proportional to stimulus dissimilarity was observed in ventral LOC, there was not significant adaptation for perfect stimulus repetition. This is not an unanticipated feature of continuous carryover designs (Aguirre, 2007). Behavioral studies have long demonstrated that different mental operations accompany the detection of same and different stimulus pairings (Sternberg, 1998). Moreover, perfect stimulus repetition in this design is an infrequent and salient event, which has been observed to modulate neural adaptation (Summerfield, 43

2008). While our inferences here do not depend upon identical stimulus repetition effects, we can imagine modifications of our design that would allow this component of the response to be more interpretable. For example, the addition of a stimulus variation that does not alter the similarity structure under study, but does disrupt the salience of perfect repetition (e.g., rotation or misalignment of sequential stimuli) may be beneficial in this regard. Our results may have relevance to the intriguing finding of paired, categorical regions of extra-striate visual cortex. Other categorically-responsive visual areas, such as those responsive to faces, places, and body parts, appear to have both a ventral and a more lateral or dorsally located component (Schwarzlose, Swisher, Dang, & Kanwisher, 2008). In the case of faces, a sub-region of the LOCthe occipital face area (OFA)has been identified in addition to the more ventral fusiform face area (FFA). It has been proposed previously that the OFA and the FFA differentially represent facial features and holistic gestalt respectively (Haxby, Hoffman, & Gobbini, 2000). Recently, we have shown that facial features are represented in both the OFA and FFA, although only the FFA shows an adaptation response sensitive to holistic representation of familiar faces (Harris & Aguirre, 2008). It remains to be seen if representation of features and wholes is a common property across these paired categorical visual areas. A correspondence between shape similarity and focal and distributed neural similarity was also observed in cortical areas other than LOC. There was a significant linear adaptation effect in the vicinity of the transverse occipital sulcus bilaterally, corresponding in location to visual area V3a. This region also showed a distributed pattern for RFC-amplitude that weakly reflected shape similarity. Previous studies 44

(Larsson & Heeger, 2006) have observed object-preferential responses in area V3a, although responses to visual motion also drive this region (Wandell, Brewer, & Dougherty, 2005). A recent study of distributed pattern similarity in ventral and lateral LOC (Haushofer, Livingstone, & Kanwisher 2008) offers an interesting contrast to our results. Using a set of four shapes, the authors found that while the distributed response in lateral LOC reflected the physical similarity of their stimuli, the distributed response in ventral LOC (labeled pFS in their study) reflected perceptual similarity. Interestingly, the ventral LOC correspondence was found to be quite sensitive to the idiosyncratic similarity judgments of different subjects. It is possible that the weak correspondence between stimulus similarity and distributed neural response in ventral LOC in our study is the result of between subject differences in the perceptual similarity of the stimuli which were not modeled. Another possibility is that, as Haushofer and colleagues used only 4 stimuli and a Pearson correlation to judge correlation, a distinctive perceptual judgment and neural response to a single outlier stimulus was sufficient to produce the modest positive correlations in ventral LOC that they observed (as is shown for perceptual similarity in Figure 1B of their paper). With 16 stimuli and thus 120 unique stimulus pairings in our study, a consistent distributed coding across the entire shape space would be needed to observe a correspondence between neural and perceptual similarity. In summary, our results demonstrate generally that the similarity of patterns of neural response within higher-order visual cortical areas can reflect stimulus similarity. By examining both within-voxel neural adaptation and across-voxel distributed patterns, we were able to identify substantial differences in the spatial scale and form of these 45

representations across cortical areas. These differences, in turn, may be related to a hierarchy of the visual processing of shape that moves from a spatially coarse representation of features to an integrated representation of shape.


Chapter 3: Distinguishing conjoint and independent neural tuning for stimulus features using fMRI adaptation
A central focus of cognitive neuroscience is identification of the neural codes that represent stimulus dimensions. One common theme is the study of whether dimensions, like color and shape, are encoded independently by separate pools of neurons or are represented by neurons conjointly tuned for both properties. We describe an application of fMRI-adaptation to distinguish between independent and conjoint neural representations of dimensions by examining the neural signal evoked by changes in one versus two stimulus dimensions and considering the metric of two-dimension additivity. We describe how a continuous carry-over paradigm may be used to efficiently estimate this metric. The assumptions of the method are examined as are optimizations. Finally, we demonstrate that the method produces the expected result for fMRI data collected from ventral occipito-temporal cortex while subjects viewed sets of shapes predicted to be represented by conjoint or independent neural tuning.

3.1 Introduction
A major goal of cognitive neuroscience is to determine how neural populations represent stimulus variation. Measurement of the tuning of neurons for variations in stimuli is one approach. For any given neuron, a tuning curve exists which describes the response modulation of that neuron as a function of different levels of a stimulus dimension. Neurons in primary visual cortex, for example, demonstrate tuning for 47

stimulus orientation, with a smooth decrement of response as the angle of the stimulus differs from the optimum. A population of these neurons, each responding maximally to a different orientation, can, as a whole, accurately encode the value of the stimulus dimension. The brain accurately represents numerous stimulus dimensions. Do different neurons represent individual stimulus dimensions, or could one neuron be tuned to represent multiple dimensions? For any two given dimensions of a stimulus (for example, the orientation and spatial frequency of a grating) two extremes of representation can be imagined. In a conjoint representation, a given neuron would have both a preferred spatial frequency and a preferred orientation. The two dimensions in this case are jointly encoded by a single population of neurons, each neuron responding optimally to a particular value of each dimension, and its response dropping with change in either one. Alternatively, the two dimensions of the stimulus could be encoded by two independent populations, with each neuron tuned for one of the dimensions but with no tuning preference for the other. In the particular case of orientation and spatial frequency, it has been shown by single-unit recording that V1 neurons generally encode these and several other visual dimensions conjointly (DeValois et al., 1982; Mazer et al., 2002). This chapter describes an application of functional MRI (fMRI) to distinguish conjoint from independent representation of two stimulus dimensions within a spatially restricted population of neurons. Functional MRI measures cortical responses with a spatial resolution on the order of millimeters. Adaptation (Grill-Spector & Malach, 2001) has been used to measure the behavior of neural populations at sub-voxel scales by measuring the graded reduction in 48

population response that accompanies repetition (or near repetition) of a stimulus property. The population of neurons within a voxel can be argued to represent (be tuned to) a stimulus dimension if the presentation of pairs of stimuli with ever greater differences along that dimension results in a progressive recovery from adaptation and thus an ever greater fMRI response. Proportional recovery from adaptation of this kind has been described for the angular displacement of gratings (Fang et al., 2005), and the similarity of faces (Jiang et al., 2006), shapes (Chapter 2), and colors (Aguirre, 2007). While potentially powerful, the inferences provided by adaptation studies are nuanced, particularly with regard to precise cortical localization (for a comprehensive discussion, see Bartels, 2008). We have recently extended the proportional adaptation approach to the presentation of continuous stimulus sequences (Aguirre, 2007). Using that method, we have shown that fMRI can be used to efficiently measure the neural representation of multiple stimulus dimensions simultaneously (Aguirre, 2007; Chapter 2). This allows us to relate the similarity of a set of stimuli to the similarity of the responses that they evoke within a neural population. In this chapter, we show that measurement of the recovery from adaptation for changes within a stimulus space can be used to distinguish between conjoint and independent neural representations. This is accomplished by measuring the recovery from adaptation for stimulus changes in both perceptual dimensions and stimulus changes in each dimension alone. If the recovery for a combined change is simply the additive combination of the recovery for each dimension in isolation, we take this as evidence for independent neural populations. When the neural recovery for a combined change is sub49

additive, this may reflect populations consisting of neurons which represent the two stimulus dimensions conjointly. The central insight that motivates our approach has been described previously. Engel (2005) tested for neural populations jointly tuned to two stimulus axes by measuring subadditivity of adaptation responses. He employed a paired-adaptation paradigm in which an adapting grating was followed by a test grating that differed in color, orientation, or both. Two levels of each factor (color and orientation) were tested. The combined stimulus change was shown to have an interacting component for some populations, in agreement with the previous findings suggesting radially-symmetric receptive fields within the stimulus space (e.g., Livingstone & Hubel, 1984; further references in Engel, 2005). We extend Engels approach in several ways. We first demonstrate that this sub-additivity inference follows from the shape of neural receptive fields within a stimulus space. Next, we describe how the covariates of a general linear model applied to a continuous carry-over fMRI experiment may be used to test for sub-additivity. We consider how the sensitivity and specificity of the method are affected by non-linearities and rotations of the perceptual space, neural representation, or hemodynamic transform. The properties of the method given populations consisting of neurons that are neither perfectly independent nor conjoint in their representation are examined, and we provide a generalization in terms of the exponent of a Minkowski characterization of the neural space. Optimizations of the approach are then described, including the sampling of stimuli from the perceptual space and the selection of efficient stimulus sequences for BOLD fMRI. Finally, we provide the results of a pair of experiments conducted on two parameterized shape stimulus sets, the dimensions of 50

which are predicted to be represented in one case by conjoint neural population tuning and in the other by independent tuning. We show that fMRI responses in ventral occipitotemporal cortex have the expected properties.

3.2 Theory

Stimulus Spaces and Conjoint and Independent Neural Populations A set of stimuli may be constructed from parametric variations along two dimensions. For example, a set of outlines may differ in their shape and color (Figure 31A). The response of a neuron to the stimuli is characterized by its receptive field within the stimulus space. We may contrast two idealized models of neuronal receptive fields arranged to represent variations in the color and shape of a set of visual stimuli. A neuron may have a conjoint receptive field within the stimulus space, such that maximal firing is only elicited by a stimulus of a particular color and a particular shape (Figure 3-1B). Alternatively, a neuron may have an independent receptive field, such that firing is maximal for a particular stimulus in one dimension (e.g., shape), but is not altered by the other stimulus dimension (in this case, color) (Figure 3-1C). Of course, intermediate tuning functions are possible; these cases are considered later. Our first theoretical goal is to distinguish between the extreme conjoint and independent models of receptive field organization within a particular voxel. These two possibilities could be distinguished directly by measuring the tuning of individual neurons. But the signal obtained with BOLD fMRI averages the population neural response from a voxel, making this measurement unavailable. If independently 51

tuned populations of neurons were separated from one another by distances of many millimeters across the cortical surface, then BOLD fMRI would be able to resolve the presence of separate cortical areas tuned to the different stimulus dimensions. Alternatively, if there were sufficient heterogeneity in the spatial distribution of independently tuned neurons, then one might be able to examine the pattern of responses across voxels (Norman et al., 2006; Kamitani & Tong, 2005) to distinguish stimuli. The inferences that may be drawn from such a distributed analysis are discussed in Appendix A. The neural representation of some stimulus dimensions, however, may be intermixed at a spatial scale below the resolving power of BOLD fMRI. In this situation, the presentation of any one stimulus in isolation might evoke the same average response across the population, rendering the conjoint and independent tuning possibilities (indeed, the individual stimuli) indistinguishable by fMRI. To distinguish conjoint and independent tuning in this case, we must measure the properties of the neural population using adaptation methods. The sensitivities of intermixed neurons may be revealed by neural adaptation and fMRI. If the experiment presents a pair of stimuli that are (e.g.) the same shape but differ in color, both a conjointly tuned and independently tuned population of neurons would show some recovery from adaptation for this stimulus transition. A pair of stimuli that are the same color but differ in shape would produce the same result. Such an experiment would reveal the presence of a population of neurons that are tuned to represent color and shape, but it would not distinguish between the conjoint and independent possibilities. To do so, one measures the recovery from adaptation associated with a combined stimulus change. Consider a pair of stimuli which differ in both color and shape. Within a 52

population of independently tuned neurons, the transition in shape will be within the unidimensional receptive field (Figure 3-1C) of some neurons, producing a recovery from adaptation; for other neurons the transition in color will have a comparable effect. The total recovery from adaptation for the population will be simply the recovery seen in the population of shape-tuned neurons plus the recovery of the color-tuned neurons. The population of conjointly tuned neurons demonstrates a different behavior. In this model system, neurons have radially symmetric receptive fields within the stimulus space (Figure 3-1B). As a consequence, for an individual neuron, the effect of a stimulus change along both dimensions is not an additive effect of each change in isolation. Instead, it is the Euclidean distance of the change in the stimulus space. This can be intuited by considering that, within a radially symmetric field, rotational invariance must hold. Consequently, a change of one unit along one dimension is equivalent to a diagonal change of one unit along both dimensions. Within the symmetric field, that combined change can be decomposed into changes of 0.7 units along each axis, demonstrating that the neuron considers a combined stimulus change to be less than the sum of individual stimulus changes (specifically, 1.0 unit of stimulus change 0.7 + 0.7 units of individual stimulus changes). Appendix B provides a formal proof that this property of individual neurons predicts sub-additive recovery from adaptation for a model population of neurons. In summary, we may distinguish between conjoint and independent tuning of neurons in a population by comparing the recovery from adaptation for combined transitions to that seen for isolated transitions along each stimulus dimension. We now consider the design of an fMRI experiment to do so. 53

Construction of a BOLD fMRI experiment In theory, one could conduct the test described above by measuring the BOLD fMRI response to three stimulus pairs: a pair that differs only in color, a pair that differs only in shape, and a pair that differs in both color and shape. As will be developed below, such a limited test is not robust to non-linearities in the measurements. A more robust test, with the ability to check for deviations from the assumed model, is provided by measuring multiple transitions within the stimulus space over a range of distances. At least three samples along each stimulus dimension are needed; here we consider a stimulus space with four samples along each dimension. To fully characterize the tuning properties of the population of neurons under study, we measure all possible transitions between the stimuli in the perceptual space. This may be accomplished efficiently using a continuous carry-over approach (Aguirre, 2007), in which stimuli are presented continuously and sequentially using a serially counterbalanced stimulus order. The participant views a stream of stimuli, perhaps while performing an attention task that is irrelevant to the stimulus similarity (e.g., detecting an infrequent target not from the stimulus space). We measure the fMRI response to each stimulus and model it as a function of its relationship to the prior stimulus: how much change is there in shape, color, or both (Figure 3-2)? This design allows us to examine the recovery from adaptation between all possible stimulus pairs (and characterize the neural response to each stimulus free of first-order context; see Appendix A). The analysis of data collected from such an experiment is based on covariates that model recovery from adaptation for the stimulus changes; we will consider two 54

covariates first. One covariate models the degree of change in color for each stimulus compared to the prior stimulus, while a second covariate models the amount of change in shape. These covariates are then convolved with a standard hemodynamic response function and used to model the BOLD fMRI data. How would these covariates model data from a voxel that contained a population of neurons with independent tuning for the two stimulus dimensions? Presuming equal and linear transforms of stimulus changes to neural recovery from adaptation and to BOLD fMRI signalassumptions which are examined belowequal loading upon the two covariates would be sufficient to model the continuous neural recovery from adaptation present in the data: the color covariate would model the recovery from adaptation produced by the population of color tuned neurons within a voxel, while the shape covariate would model the behavior of the independently tuned shape responsive neurons. Because the linear addition of these two covariates is a sufficient model for the data, we term these the city-block covariates, as the fMRI response to the transition between any two stimuli is well described by the rectilinear (purely additive) distance between the stimuli. How would the model behave given a voxel that contained a population of neurons with conjoint tuning to the stimulus dimensions? The covariates would be unable to simultaneously model the recovery from adaptation associated with single stimulus changes and that from combined stimulus changes. This is because the signal from combined stimulus changes would be less than that predicted by the sum of the isolated stimulus changes. The variance attributable to this sub-additivity can be modeled with a Euclidean contraction covariate, which takes the value of the difference between the City-block and Euclidean distance between the stimuli (a.k.a. a farm-gate contraction; 55

Wyszecki & Stiles, 2000). This covariate captures the degree to which the neural response deviates from pure additivity for the two stimulus dimensions. If this covariate models a significant amount of the variance in the observed responses, then we may be able to reject the independent (purely additive) neural representation model. For ease of interpretation (although not of statistical necessity) the Euclidean contraction covariate is orthogonalized with respect to the City-block covariates so that it will have zero loading in presence of an independently tuned neural population. The additional components of the model (e.g., main effects versus null-trials, stimulus repetitions, stimuli that follow null-trials) are considered at length elsewhere (Aguirre, 2007). An example set of data, covariates, and results is available for download (

Distortions of the measured response To test for the presence of a population of conjointly tuned neurons we manipulate a set of perceptual stimuli and measure an evoked BOLD fMRI response. Several transformations of the independent and dependent variables intervene between the data and our desired inference. Here we examine the specificity and sensitivity of the method in the face of these distorting transformations. In an idealized model, the stimulus space presented to the subject evokes equal, regular, and linear differences in neural representation, so that proportional steps in the space produce proportional changes in the similarity of neural response, and in turn a linearly proportional recovery from adaptation for the population. These changes in neural activity are then transformed into BOLD


fMRI signal change by linear convolution with the hemodynamic response function. Of course, non-linearities and asymmetries may exist at each of these steps. Particularly troublesome are compressive non-linearities that act symmetrically upon both dimensions of the stimulus space representation. Such a distortion could cause independently tuned neural populations to appear conjointly tuned, as larger stimulus transitions will be less than predicted from smaller stimulus transitions, mimicking the behavior of a Euclidean distance metric. A plausible cause of such a symmetric, compressive, non-linearity is saturation of the transformation of neural activity to hemodynamic response. As we will argue, however, the relatively small signal modulation produced by neural adaptation justifies a small-scale linear approximation in this case. We examined the effect of hypothetical distortions in the setting of simulated data from an experiment. The MATLAB code used for the simulation is available for download (

The model begins with a set of regularly spaced stimuli1 (Figure 3-3A), and thus a similarity matrix defined by the distance between the stimuli. We considered next the similarity matrix of neural responses for a population of neurons that represent the stimuli. The neural representation could perfectly reflect the original stimulus space, or may contain distortions (Figure 3-3B). For example, the changes in the stimuli along one axis may be more salient than the changes along the other axis, and this change may be a

The di-octagonal space, discussed below, was used. For ease of explanation, however, Figures 3-3 and 3-4

illustrate the simulation using a grid-spacing of stimuli.


linear or non-linear transform of the original stimulus space. Further, a non-linear transform of both axes may be present, such that (for example) the neural representations of the stimuli from one corner of the space are much more similar than the neural representations of stimuli from the other corner. Given a sequence of stimuli in a continuous carry-over design, we can then model the time-course of neural response that would be expected given recovery from adaptation proportional to the dissimilarity of neural responses to each stimulus (see Appendix B and Verhoef, 2008). The dissimilarity between stimuli (and therefore the recovery from adaptation) can be modeled assuming either conjoint neural tuning and a Euclidean distance metric, or independent neural tuning and a city-block distance metric. This results in two different models of neural activity over time, corresponding to the independent and conjoint possibilities. This neural signal is then transformed into a hypothetical BOLD fMRI signal by convolution with a standard hemodynamic response function (Aguirre, 1998). Prior to convolution, however, the possibility of another non-linearity is introduced. While generally conforming to a linear system (Boynton et al., 1996), nonlinearities in the transform of neural activity to BOLD fMRI signal may exist. Of particular importance are compressive non-linearities, in which a doubling of the neural response results in less than a doubling of the BOLD signal (Vazquez & Noll, 1998). We considered several non-linearities (Figure 3-3C) imposed over the full range of neural response (i.e., from no neural activity to a hypothetical maximal neural activity in an area). Perceptual adaptation, however, has not been observed to modulate neural activity over its entire range. When measured in visually responsive cortical areas, the modulation 58

of BOLD fMRI signal by adaptation has been at most 20% of the maximal response of the region (Aguirre, 2007; Fang et al., 2005; Kourtzi et al., 2003). Therefore, we selected the most non-linear 20% range of each distorting function, and examined the effect of the distortion applied to the neural signal prior to convolution. A set of 100 different orders of stimulus presentation were considered. Each was a counterbalanced sequence selected for a having high overall Efficiency for detection of the City-block and Euclidean contraction effects (described below). A simulated BOLD fMRI data sequence was then created for each possible crossing of the distortion of neural representation with distortion of the hemodynamic transform, for both the conjoint and independent distance metrics. The final, simulated BOLD fMRI sequence was then analyzed with the covariates described in the previous section; i.e., we obtained the loading of the simulated data on the model covariates. The average loading on the model covariates was obtained across the 100 different sequences of stimulus presentation. Ideally, the loading upon the Euclidean contraction covariate should remain zero for an independently tuned neural population, and have a positive loading for a conjointly tuned population, regardless of the distortions introduced. Table 3-1 presents the average loading upon the Euclidean contraction covariate for assumed conjoint and independent neural populations in the face of distortions of the neural similarity space, and non-linear transforms of neural activity into BOLD fMRI signal. Appropriately, the covariate has a positive loading (1.4) for the conjoint population and zero loading for the independent populations when no distortion is applied. Further, the model is robust to distortions. A positive loading for the conjointly tuned neural population is found in all cases. This indicates that the model remains sensitive for the test of conjoint neural tuning across a 59

variety of distortions and non-linearities. When independently tuned neural populations are assumed, the loading upon the Euclidean contraction covariate generally remains at zero or is negative, preserving the specificity of the approach (i.e., one would not mistakenly reject the null-hypothesis of independent populations). In a few cases, the non-linearities produce a positive loading upon the Euclidean contraction covariate even when independent neural populations were assumed. These cases are potentially problematic as they constitute improper bias. Fortunately the degree of bias was generally small. In the face of non-linearities in the BOLD response, the improper loading upon the Euclidean contraction covariate was about one tenth of that measured for the city-block effect. Therefore, even in the face of quite severe, compressive non-linearities in the BOLD hemodynamic response, the relatively small scale of neural adaptation changes maintains the specificity of the method. It is worth noting that non-linearities in the BOLD response may be further discounted as the cause of a finding of conjoint representation if, for a separate stimulus space, independent representation is demonstrated. Logarithmic transformations of both axes of the neural representation were, however, a problematic case, as they tend to produce larger loadings (0.4) upon the Euclidean contraction covariate. Several steps may be taken to guard against this possible non-linearity. First, and as detailed below, behavioral testing conducted with the stimulus space should be used to confirm that roughly equal perceptual salience accompanies equal changes within the stimulus space. Next, post-hoc testing may be conducted upon the discretized responses to stimulus pairs to examine the relationship between stimulus change and recovery from adaptation. This relationship may be examined for 60

compressive non-linearities. An example of such an analysis is provided in the demonstration experiment below and described in Appendix C. Finally, the Euclidean contraction covariate effect can be statistically judged not against a loading of zero, but against 40% of the city-block covariate effect. If the Euclidean contraction effect is significantly larger than this proportion, then our simulations suggest that a symmetric, compressive non-linearity at the neural level cannot account for the result. The average loading upon the covariates are presented here for a set of possible sequences of stimulus ordering. While the model maintains its expected sensitivity and specificity in the face of distortions and non-linearities on average, the model may be less robust in the instance of a particular pairing of a distortion with a particular sequence. Therefore, it is advisable to use a variety of sequences within and across scanning sessions so that the robust aggregate performance of the model is retained. To summarize, the proposed method retains sensitivity for the test for conjointly tuned neural populations in the presence of a variety of distortions and non-linearities in the transformation of neural population codes to BOLD fMRI signal. Specificity, i.e. the absence of a positive test outcome in the setting of independently tuned neural populations, is retained for the majority of considered distortions. Where improper bias was found as a consequence of a hemodynamic non-linearity, the bias was small. Improper bias could be induced by symmetric, compressive non-linearities in the neural representation. Experimental design optimizations and post-hoc techniques to guard against this situation were offered.


Rotation of the assumed stimulus dimension axes A different violation of the model assumptions occurs when the underlying neural representation is independent for the stimulus dimensions, but its neural instantiation is not aligned with the assumed dimensional axes of the study. For example, consider an experiment designed to examine the neural representation of rectangles. The stimulus space used in the experiment consists of rectangles that vary in height and width, and the experimenter models these two parameters. It may be the case, however, that a population of neurons actually has independent tuning for the sum and difference of height and width (roughly corresponding to area and aspect ratio); a 45 rotation of the axes as modeled by the experimenter. Figure 3-4A illustrates a population of neurons with receptive fields rotated 22.5 with respect to the assumed axes of the study. As shown in the last two rows of Table 3-1, even assuming equal and linear transformations, a positive loading upon the Euclidean contraction covariate would be obtained in the case of rotation, leading to the erroneous conclusion of a conjoint representation of height and width. When presented with significant loading upon the Euclidean contraction covariate, and thus tentative evidence for conjoint representation, an additional test is required to reject the possibility of rotated, independent axes. This special case may be detected by examining the behavior of the covariates under rotations of the assumed model (Figure 3-4B). The analysis is repeated assuming that the stimuli presented actually were organized by (e.g.) area and aspect ratio (as well as partial rotations). In practice, this is accomplished by calculating distances for the City-block covariates assuming different axes than those actually used. For example, if a 45 rotation of the 62

model is being tested, then a stimulus change in both height and width of a rectangle is represented in the covariates as a single change in the one covariate (now behaving as a size covariate) and no change in the other covariate (now behaving as an aspect ratio covariate). Figure 3-4C presents the loading upon the City-block and Euclidean contraction covariates that will be obtained in the presence either of conjointly tuned neurons or independent populations of neurons with receptive fields oriented 22.5 away from that initially assumed in the experiment. If the neural representation is truly conjoint, then the loading upon the Euclidean contraction covariate will be unchanged with rotation of the model (ignoring the effects of gamut; discussed below). In contrast, a minima for the Euclidean contraction covariate is found when the assumed axes for the stimuli match the actual axes in neural representation (Figure 3-4C). Because of discrete (16 point) sampling of a continuous stimulus space, the Euclidean contraction covariate receives an artifactual, negative loading at rotations adjacent to the veridical 22.5 value. If we perform the simulation with an increase in the sampling density (a 7 7 stimulus array and 49 samples), the true, monotonic function is revealed. This imperfection should have little effect on the inference. In summary, when significant loading upon the Euclidean contraction covariate is obtained in an experiment, an additional test is necessary to reject the possibility of independent, but misaligned, neural populations. Post-hoc testing of the performance of the model under assumed rotations of the stimulus axes can distinguish between the independent but rotated, and the conjointly tuned cases.


Extension to a generalized neural space metric So far we have considered two competing, extreme models of neural representation: two independent neural populations that represent stimulus dimensions separately, and a conjoint population that represents the two dimensions together. Earlier, we considered how these concepts are related to receptive fields that are either linear or radially symmetric within a stimulus space. Intermediate receptive fields are possible, however, with oval shapes of varying elongation. In such cases the population would not be wholly independent, but instead represent one dimension to a greater extent than the other. These intermediate cases are considered readily within the framework of the Minkowski exponent that defines the representational space. We have seen how independent neural populations may be expected to produce recovery from adaptation proportional to the rectilinear, City-block distance of the stimulus transition, while a conjoint population will produce a sub-additive recovery proportional to the Euclidean distance. City-block and Euclidean metrics correspond to metrics of r=1 and r=2 within a generalized Minkowski measure of distance: [Eq 3-1] where the total distance d in an n-dimensional space between stimuli a (a1, a2, an) and b (b1, b2, , bn) is related to the distance along dimension k of n dimensions. Oval receptive fields correspond to 1 < r < 2. The earlier simulation that produced the expected beta values for the independent and conjoint neural representation cases are repeated in Figure 3-5A for these intermediate (and other) Minkowski values. As can be seen, there is a smooth and decelerating increase in loading upon the Euclidean 64

contraction covariate as the Minkowski exponent of the underlying neural representation increases. Thus, significant positive loading upon the Euclidean contraction covariate may be taken as a rejection of independent representation of the stimulus dimensions, although not necessarily as an endorsement of a fully equal and conjoint organization. One might further consider r < 1 or r > 2. A Minkowski value of less than one corresponds to a neural similarity measure in which the response to a change in both dimensions is greater than the sum of the changes in each. This non-geometric representation (as it violates the triangle inequality assumption) can be related to featural models of similarity (Tversky, 1977; Goldstone & Son, 2005). Testing for the predicted negative loading upon the Euclidean contraction covariate is a straightforward extension of the current technique. Conversely, a Minkowski of greater than 2 reflects greater weight being placed upon the dimension with the greater perceptual change. In the limit, r = corresponds to a similarity measure which reflects only the dimension with the larger change (the dominance metric). In practice, the actual loading upon the elements of the model will be in terms of fMRI signal change, and will vary from region to region and for different stimulus spaces. Therefore, the absolute magnitude of the effect cannot be precisely interpreted, making the ratio of the Euclidean contraction to the city-block effect a more useful measure (Figure 3-5B). Generally, positive loading supports non-independent coding and negative loading supports featural coding. Moreover, the relative loading upon the covariate in a single cortical area is interpretable. The Euclidean contraction measured for two different stimulus spaces, and thus the relative degree of combined stimulus representation, could be of neuroscientific interest. For example, one may wish to 65

compare different stimulus changes that define a multi-dimensional perceptual space to determine which dimensions are privileged, in the sense that they have explicit and independent neural coding.

Optimizations The test for a conjointly tuned neural population amounts to the measurement of variance attributable to the Euclidean contraction covariate. We consider here optimizations of the approach to maximize power for this test. Normalization of perceptual space: In our earlier consideration of asymmetries and distortions of the perceptual, neural, and hemodynamic transforms we found that these distortions can alter loading upon the Euclidean contraction covariate. Therefore, a primary optimization is to select stimuli that are likely to map to linear and equal changes in the neural representation. Measurements of neural receptive fields from single unit recording may be used to establish the spacing and gamut of the stimuli on each dimension to best linearize and normalize the neural representations. In most cases, of course, such a direct measure of neural response will be unavailable, as this is typically the goal of the experiment itself. Behavioral testing (e.g., Kruskal & Wish, 1978) may be used to ensure that perceived changes along each dimension are equivalent and the scaling of the similarity is uniform throughout the space, with the hope that this normalized perceptual space will correspond to a normalized neural representation. Stimulus space sampling: Two continuously varying stimulus dimensions will together define a space with an infinite number of unique points. In experimental settings, only a few representative points are chosen to provide an estimated characterization of 66

the entire space. Some stimulus sets can more efficiently recover the complete space than others. Previous work with well-defined perceptual stimulus spaces has mostly used stimuli sampled from a square grid, as in Figure 3-6A. The grid spaces stimuli equally spaced along the two dimensions, with all stimuli within a row or column varying along only one dimension. We propose an alternate sample space (inspired by Shepard, 1964) consisting of two nested octagons (a di-octagon), that contains the same number of stimuli as the grid but more desirable properties (Figure 3-6B). First, the gamut (or range in stimuli) is comparable whether one or both dimensions are under consideration. That is, the maximal distance between two points along one of the chosen dimensions (e.g., between 0,29 and 100,29) will always be the same as the maximal distance between two points defined by both dimensions (e.g., 0,29 and 71,100). This renders combined stimulus changes as salient as stimulus changes along a single dimension. In a square grid, by contrast, all distances along a single dimension are shorter than the distances between points that vary on both dimensions. A related benefit is that the distribution of distances across pairs is more uniform in the di-octagon, as opposed to the grid which is skewed towards pairings at short distances. Post-hoc model evaluation is also strengthened. As described above, a rotation of the assumed stimulus space is used to check for independent, but misaligned neural dimensions. The di-octagonal space allows this model rotation to be conducted without unduly affecting the gamut, simplifying the interpretation of the loading upon the Euclidean contraction covariate under these hypothetical rotations.


Finally, the di-octagonal space increases the range of the Euclidean contraction covariate, thus improving power. The variance of the Euclidean contraction covariate is maximized in the comparison of stimulus transitions along a single dimension to equal stimulus transitions along both dimensional axes. The di-octagonal space is configured to maximize the number of stimulus pairs that represent a pure change (0 degrees difference from a single dimensional axis) and pairs that represent equal change on both axes (45 degrees from a single dimensional axis), as compared to a square grid, which has many stimulus pairs at other angles, and thus less potentially informative variation. Sequence selection: The sequence that dictates the order of stimulus transitions may also be optimized. In prior work (Aguirre, 2007) we have shown that the order of counterbalanced stimulus presentation used in a BOLD fMRI experiment can impact sensitivity. A counterbalanced presentation order can be provided by an m-sequence (Buracas & Boynton, 2002) or a type 1 index 1 sequence (Nonyane & Theobald, 2008). Permutations of the assignment of stimuli to labels in these sequences can be examined for their relative Efficiency (Friston et al., 1999; Aguirre, 2007). For the di-octagonal stimulus space of 16 stimuli, a 17 element sequence is needed if null-trials are to be included as well. No n=17 m-sequence exists. We therefore searched permutations of n=17, type 1, index 1 counterbalanced sequences and measured the Efficiency of the sequence for detection of loading upon the Euclidean contraction covariate (assuming a conjoint neural representation) and loading upon the City-block covariate (Figure 3-7). The relative Efficiency of a sequence for the Euclidean contraction and City-block covariates is uncorrelated. Therefore, a sequence may be selected that optimizes Efficiency for detection of one or the other or a balance between the two. Notably, 68

searching across label permutations nearly doubles the expected Efficiency of a sequence selected at random. Code to search for Efficient sequences, as well as pre-selected sequences, are available for download (

3.3 Example Experiment

We turn now to an application of the metric estimation method to an fMRI experiment. We wished to apply the model to examine the neural representation of two stimulus spaces: one predicted to be a conjointly tuned population of neurons, and one predicted to be a population of independently tuned neurons. Our selection of stimuli was motivated by the psychological study of integral and separable perceptual spaces. Some visual properties of objects are apprehended separately (e.g., color and shape), whereas other dimensions are perceived as a composite (e.g., saturation and brightness); these have been termed separable and integral dimensions (Shepard, 1964). We hypothesized that integral perceptual dimensions are represented by populations of neurons that represent the dimensions conjointly, while separable dimensions are represented by independent neural populations; similar ideas have been proposed recently (Arguin & Saumier, 2000; Cant, 2008; Kayaert et al., 2005; Stankiewicz, 2002). We constructed two sets of simple 2D closed contours that varied along two parameterized dimensions. The first set (Figure 3-8A) consisted of ineffable popcorn shapes, defined by radial frequency components. Abstract radial frequency components are not thought to be a central organizing component of visual cortex (Albright & Gross, 1990), so we have no reason to predict that the two dimensions would be independently represented. Further, we behaviorally characterized these shapes as integral using the 69

Garner sorting task (Garner & Felfoldy, 1970), replicating Op de Beeck (2003), from which this shape space is derived. The second set of moon shapes (Figure 3-8B), which vary in curvature and thickness, were behaviorally characterized as separable , again replicating Op de Beeck, 2003), and thus predicted to be represented by a neural population with independent tuning for curvature and thickness. In addition, there is evidence strongly suggesting that shape curvature and aspect ratio are independently coded (Kayaert et al., 2005; Arguin & Saumier, 2000; Stankiewicz, 2008; Op de Beeck et al., 2003).

3.3.2 Materials and Methods Stimulus generation and norming Stimuli in each of the two experiments were sixteen simple filled shapes constructed from RFCs similar to those used by Op de Beeck et al. (2001, popcorn shapes; 2003, moon shapes). The stimulus generation code is available for download ( We adjusted the parameters supplied by Op de Beeck et al. such that perceptual distances, for popcorn and moons as measured in RT to difference judgments and explicit similarity ratings, within and across 12 subjects (8 females, ages 20-33) very closely matched the distances called for by the di-octagon arrangement. We used the Garner (Garner & Felfoldy, 1970) task to behaviorally characterize the popcorn and moon stimulus axes. Nine subjects (7 females, ages 22-33) performed 640 sorting operations on exemplars of the popcorn and moon shapes (1280 total). At the 70

beginning of each of 16 blocks of 40 trials, the subject was shown visually which dimension was to be used for sorting; a key press on each trial indicated the value on that dimension for the stimulus. In the filtering condition, the unattended axis was varied randomly, while in the correlated condition, the value of the other axis was perfectly correlated with the attended axis (Figure 3-9A). Subjects were faster (p < 0.001) sorting popcorn shapes in the correlated condition than in the filtering condition; there was no difference between the conditions for sorting the moon shapes (Figure 3-9B). The conditional difference in RT between the two shape conditions was significant (p < 0.001). These results support the description of the popcorn axes as integrally perceived and the moon axes as separably perceived (Garner & Felfoldy, 1970; replicating the previous results of Op de Beeck, 2003).

Subjects and scanning parameters Four right-handed women and two right-handed men aged 20-35 participated in the study. All subjects provided informed consent and the study conformed to the guidelines of the University of Pennsylvania Institutional Review Board. Structural and functional data were collected on a 3.0-T Siemens Trio scanner using an 8-channel head coil. High-resolution T1-weighted structure images were collected in 160 axial slices and near isotropic voxels (0.9766 mm 0.9766 mm 1.0000 mm; TR = 1620 ms, TE = 3 ms, TI = 950 ms). Functional, blood-oxygenation-level-dependent (BOLD), echoplanar data were acquired in 3 mm isotropic voxels (TR = 3000 ms, TE = 30 ms). BOLD data were acquired in 42 axial slices, in an interleaved fashion with 64 64 in plane resolution. The functional data were collected in 5 runs of 159 TRs each. The first 6 s of each run 71

consisted of dummy gradient and radio frequency pulses to allow for steady-state magnetization during which no stimuli were presented and no fMRI data were collected. The next 15 s displayed the stimuli presented in the last 15 s of the previous run (or the final run, in the case of the first run); these periods of scan overlap allow the carry-over BOLD response to build to a steady state, and were removed in processing (Aguirre 2007).

Stimulus presentation and scanner task During scanning, each shape was drawn on a mean gray background, and behind it a line was drawn such that it bisected the shape leaving 65% to one side or the other (Figure 3-10). This line was randomly tilted between 10 and 40 degrees from the vertical. A space was kept between the line and the shape such that they would not intersect. The shape was defined by a light green color that was isoluminant with the background. The stimuli were back-projected onto a screen viewed by the subject through a mirror mounted on the head coil, and subtended 5 5 of visual angle. Each stimulus was presented for 1400 ms, with a 100 ms ISI consisting of the mean gray background. The subject was instructed to indicate on each trial, by button press, whether the line was drawn more to the left or right of the shape. The task was assigned solely for the purpose of requiring the subject to attend to every stimulus in the experiment, and was constructed so as to not involve an explicit comparison between sequential stimuli. All subjects performed above 92% accuracy, and the mean accuracy was 96% (popcorn) and 98% (moons), indicating that subjects were alert and monitoring the stimuli as they were presented. 72

Stimulus sequence In each of the two experiments, each of the 16 different shapes was presented to each subject 85 times in a fully first-order counterbalanced sequence. The order of stimulus presentation was determined by an n=17, type 1 index 1 sequence. The full sequence was divided into five parts for scanning as described previously (Aguirre, 2007). The labels 1-16 were assigned to the 16 stimuli and the 17th label indexed the presentation of a blank trial (gray screen with fixation cross), which had a duration of 3 seconds (Aguirre, 2007). This sequence provides for first-order counterbalancing of the stimuli, such that every image appeared in the sequence both before and after every other image, as well as before and after 3 seconds of a blank screen. A particular type 1 index 1 sequence was selected which maximized efficiency (Friston et al., 1999) for the balanced detection of adaptation effects proportional to stimulus similarity and the predicted Euclidean contraction effect. This sequence was identified by brute-force search of several hundred thousand sequences (Aguirre, 2007) and can be obtained from our website (

Image pre-processing Off-line data analysis was performed using VoxBo ( and SPM2 ( software. Data were sinc interpolated in time to correct for the slice acquisition sequence, motion corrected with a six-parameter, least squares, rigid body realignment routine using the first functional image as a reference, and normalized in SPM2 to a standard template in Montreal Neurological Institute (MNI) 73

space. Normalization maintained 3 mm isotropic voxels and used 4th degree B-spline interpolation. In the analysis of adaptation effects, the fMRI data were smoothed in space with a 3 3 3 voxel isotropic Gaussian kernel. The average power spectrum across voxels and across scans was obtained, and the power spectrum fit with a 1/frequency function (Zarahn, Aguirre, & D'Esposito, 1997). This model of intrinsic noise was used during regression analyses with the Modified General Linear Model (Worsley & Friston, 1995) to inform the estimation of intrinsic temporal autocorrelation. Voxels that composed regions of interest were identified for each subject as the intersection of a categorically defined area, (LOC, identified by response to object > scrambled object at a threshold of t > 3) defined from data obtained during separate scans using standard methods (Harris & Aguirre, 2008), and areas where adaptation to both dimensions of both sets of stimuli were found (using a threshold of t > 2 for each dimension). Note that this criteria for voxel selection is orthogonal to the Euclidean contraction covariate of interest. The ROI analyses reported here combined data from the left and right hemispheres. The location of the selected voxels of interest (Figure 3-11A) were presented (using BrainVoyager; atop the MNI anatomical image that served as a template for spatial normalization.

3.3.3 Results and Discussion For the hypothesis regarding conjoint representation to be posed, it is first necessary to identify a neural population that exhibits a proportional recovery from adaptation for both of the stimulus dimensions. For each subject, we identified within 74

ventral occipito-temporal cortex voxels that showed recovery from adaptation to both stimulus axes for both stimulus spaces. Figure 3-11A shows the position of these voxels across subjects. Most voxels were concentrated around the right posterior fusiform sulcus, corresponding to ventral LOC (Chapter 2). Because of their method of selection, the identified voxels are guaranteed to have some recovery from adaptation. Figure 311B shows the magnitude of recovery along each stimulus dimension demonstrating the roughly equivalent effect of each axis, and between the two stimulus spaces. We then examined the loading upon the Euclidean contraction covariate (Figure 3-11C). The effect of the Euclidean contraction may be tested within the voxels selected as demonstrating recovery from adaptation, as these two effects are orthogonal. The popcorn space, defined by perceptually integral, ineffable dimensions, had significant loading upon the Euclidean contraction covariate across subjects [t(5df)=4.7, p=0.0055], allowing us to reject the hypothesis that the popcorn dimensions are represented by independent neural populations. The magnitude of the effect was too large to be explained by undetected distortions of the neural space, based upon our simulations: the loading upon the Euclidean contraction covariate was greater than 0.4 of the primary adaptation (city-block) effect for each subject [t(5df)=2.6, p=0.048]. In contrast, for the moon space, defined by the separable dimensions of curvature and thickness, loading upon the Euclidean contraction did not differ from zero [t(5df)=0.1, p=0.92], allowing for the possibility that independent neural populations code for the stimulus space. The direct comparison of the popcorn and moon results, predicted to yield a larger Euclidean contraction value for the popcorn space, trended towards significance [t(4 df)=2.5, p=0.067, one tailed]. 75

A positive loading on the Euclidean contraction covariate, as in the popcorn case, should be further evaluated with systematic rotations of the assumed stimulus dimensional axes. The purpose of this post-hoc test is to ensure that the result cannot be explained by independent neural populations with receptive fields that are misaligned with the axes that were assumed to define the stimulus space. Model rotation will leave the loading on the Euclidean contraction covariate unaltered in the case of a conjointly tuned neural population. In the case of independently tuned but misaligned populations, however, the loading should drop to zero at the appropriate model orientation. Figure 311D shows that the Euclidean contraction covariate maintained positive loading across all model orientations, confirming a non-independent neural population. The slight dip and rise in the observed function may be taken as evidence that the actual distance metric of the neural representation is between 1 and 2, with the orientation of the oval tuning functions aligned with a 45 rotation of the stimulus axes. A similar result would obtain if the studied voxels contained a mixture of both independent and conjointly tuned neural populations. Finally, we conducted an additional post-hoc test to determine if compressive non-linearities in the neural representation of the stimulus space could be responsible for the finding of conjoint neural tuning for the popcorn space (see Appendix C). The recovery from adaptation for stimulus pairs that changed along a single dimension was obtained, and used to compare the position of a stimulus to the recovery of BOLD response (Figure 3-11E). For both dimensions of the popcorn space, the relationship was close to linear. As the method of voxel selection strongly biased us towards a pool of voxels with such a linear relationship in this example, the application of this post-hoc test 76

will be more relevant when applied to data selected from (e.g.) an anatomical region of interest.

3.4 General Discussion

We have described an application of fMRI that can test for non-independent neural tuning for stimulus dimensions. The method examines the recovery from adaptation associated with changes in stimuli along one or two stimulus dimensions. By examining the additivity in response to combined stimulus changes, the metric of the neural representation can be tested. This in turn informs as to the underlying neural implementation of that representation. We have shown that the approach is generally robust to non-linearities and distortions of the measurements, and have proposed post-hoc tests that can guard against improper bias. An application of the method to a stimulus space expected to have a conjoint neural representation yielded the predicted result. A ready criticism of the approach is that it uses a measure of neural adaptation recovery to index the similarity of neural responses to stimuli. Single unit recordings have identified circumstances in which the stimulus selectivity of the adaptation recovery effect does not completely reflect the tuning of the un-adapted neuron (e.g. Sawamura, 2006). Importantly, however, the method we describe depends on a slightly different assumption: that the magnitude of neural adaptation recovery is proportional to the magnitude of the stimulus change. Recent work (Verhoef, 2008) has found just this response in neurons in macaque infero-temporal cortex. Moreover, our simulations show that the test is robust to even substantial deviations from a linear relationship between adaptation recovery and stimulus similarity. 77

A deeper issue concerns the localization inference provided by fMRI adaptation methods (Bartels, 2008). As BOLD fMRI is thought to be more sensitive to synaptic input activity and local processing than to cell-body spike rate, it is possible that local adaptation effects will reflect the tuning properties of inputs to the region, as opposed to the tuning of the neurons within the region itself (Tolias, 2005). This possibility does not render the results of adaptation studies uninteresting, but does nuance their interpretation. It also provides further justification for the carry-over approach (Aguirre, 2007), in which the direct (non-adaptation) effects of stimuli may also be observed using pattern analysis methods (albeit with inferential challenges as well; see Bartels, 2008 and Appendix A). Our method of metric estimation of neural tuning joins long-standing efforts in psychology and neuroscience to determine the separability or modularity of mental operations. In the cognitive neurosciences, fMRI has frequently been used to demonstrate that two tasks evoke activity in separate brain regions, leading to the inference that the tasks are subserved by different modules (Sternberg, 2001 p. 186). This work itself derives from much earlier attempts to use the additivity of behavioral measures, such as reaction time, to deduce the structure of mental processes. Beginning with Donders notion of pure insertion in the 19th century, this was most notably discussed and expanded by Sternberg (1969; 2001). A related domain of study has been the determination of the metric of stimulus spaces (Shepard, 1980), with implications for the separate analyzability of perceptual dimensions. A theoretical challenge considered in this extant literature, relevant to our current study as well, is the reification of studied stimulus dimensions found to be independent. Although a particular study may find independent tuning for a pair of stimulus dimensions, it does not automatically follow that neurons are 78

therefore tuned for those axes. It remains possible that the dimensions selected for study are manifestations of some further, as yet unstudied, organizational scheme. Previous efforts to identify independently represented stimulus dimensions have proceeded across several sensory domains. Support for independent coding for dimensions such as chromaticity and spatial frequency was found in behavior using a Shepard additivity approach (Monnier, 2006). The general behavioral separability of spatial and temporal frequency (Reisbeck & Gegenfurtner, 1999) is reflected in electrophysiological measures (Priebe et al., 2003). Similarly, shape selectivity has been broadly found to be independent of cues such as position and size in macaque IT (Sary et al., 1993; Janssen et al., 2000), and surface texture (Kteles et al., 2008). In the human auditory system, different dimensions of timbre were found to be processed separately, based on additivity of their mismatch negativities (Caclin et al., 2006). Our method amounts to using a linear model to test the metric of a space; an approach which has been considered problematic (Hubert, 1992). Iterative goodness-offit measures have been generally used instead, although these can also be confounded by isometries between metric spaces as measured by finite numbers of stimuli (Arabie, 1991). For example, a set of stimuli might be best fit in two dimensions with a Euclidean metric, but in three with a City-block metric. Consequently, the linear method we have described is not automatically generalizable to the study of stimulus spaces beyond that examined here. Specifically, we have argued by simulation for the validity of our model for two dimensions with 16 regularly spaced samples. While there is a vast array of neuroscientific questions that might be asked within this domain, our method would need to be verified anew for application to a different form of stimulus space. 79

An avenue for future investigation is the use of iterative models to fit the fMRI data generated by our method. In contrast to the three covariate approach we have examined, it is possible to model the degree of neural adaptation recovery comprehensively. This could be accomplished using a basis set of binary covariates that symmetrically model each stimulus transition (see Appendix C). The beta weights upon these covariates could then be used to create a diagonally-symmetric, 16x16 matrix of neural similarity for the stimuli, indexed by recovery from adaptation. The resulting similarity matrix could then be submitted to an iterative multi-dimensional (MDS) or probabilistic (PROSCAL; scaling analysis for the measurement of space metrics. Our expectation is that the greater flexibility and reduced assumptions of such an approach will necessarily result in a loss of statistical power for the specific inferences we have focused upon in this chapter. We have considered in this chapter several types of non-linearities and distortions that can exist in neural representation or recovery from adaptation. While we find that the method is generally robust to these deviations, there naturally exists the possibility of further violations of the assumptions of the model that we have not evaluated. A general approach that may be taken to reduce the risk of mistaken inference is to study two different stimulus spaces for a given cortical region of interest. If a different result is obtained for each, that is, additivity for one but not the other, this can serve as evidence that the obtained result is not the result of non-linearities in recovery from adaptation or hemodynamic response at that site. We envision the use of the metric estimation test to study the representation of stimulus properties across sensory cortical areas. By revealing the presence of 80

independently tuned neural populations, the fundamental axes of perceptual representation might be identified. Interestingly, a given stimulus space may be represented conjointly in one region of cortex, but independently in another. For instance, while several stimulus properties are known to be conjointly represented by neurons in V1 (Mazer, 2002), separate tuning for these properties appears at higher cortical levels. Our method can show, within a single experiment, the progression from conjoint to independent representation across cortex for a set of dimensions. Finally, our linear metric estimation method may be applied to other imaging modalities besides fMRI. The key feature of the measure to be tested is that it derives from a population of neurons for which signal adaptation recovery is monotonically related to stimulus similarity. For example, Furl and colleagues (Furl, 2007) have studied recovery from adaptation to facial expression with magnetoencephalography (MEG). They demonstrated that the M170 component from the superior temporal sulcus (STS) has a monotonic recovery from adaptation to the degree of change in facial expression. Using the method we have described, one might now test the further hypothesis that independent neural populations in the STS are tuned to the facial expressions of fear and anger. Using a similar logic to that employed in our work, Caclin et al. (2006) used the additivity of a component of auditory event-related potentials to infer partially independent neural populations for processing dimensions of timbre. In summary, we have presented a linear test and its optimizations for measuring the metric properties of neural tuning using fMRI. This approach builds upon our earlier development of the continuous-carry over design (Aguirre, 2007) which introduced a method for efficiently characterizing neural similarity spaces. With our current work, we 81

have now refined the approach to allow measurement of the metric relationship between the dimensions of a similarity space.


In Chapter 2, I showed that objects can be represented by patterns of neural activity which have a similarity structure isomorphic to a perceptual similarity structure of the set of stimuli. In hindsight, this result would be relatively unsurprising to most people who think seriously about cognitive neuroscience; intuitively, we expect that at some level brain activity must be correlated with perceptual experience. One of the earliest conceptions of mental representation, articulated most famously by Aristotle, assumed an explicit similarity structure. That is, perhaps we really do have, in the brain, a veridical proximal copy which comes into existence in response to every distal stimulus: when we see an apple, a pattern in the brain arises which is literally akin to redness, roundness, crispness, tastiness, etc. In certain parts of the brain e.g., V1 this is true, of course: we see a shape in the world, and that shape is projected with some degree of fidelity onto retinotopic cortex. Although this simplistic view is consistent with the findings discussed in this dissertation, it eschews all forms of necessary abstraction. While there must be some kind of correlation between brain states and world states, however, there is no logical requirement that these brain states have the same similarity structure as the states in the world. Perhaps instead each stimulus we encounter in the world is translated into a unique pattern of activity which doesnt bear any relationship with whats being perceived. As pointed out in Chapter 2, this kind of representation could still allow techniques like pattern classification to differentiate between neural states associated with different percepts; one advantage of our findings and our technique is that it can explicitly identify or rule out this kind of representation. 83

If similarities in the world are not represented by similarities in the brain, the brain has failed at its most important task - extracting and acting on regularities. Fortunately, it seems that patterns of activity in the brain do, in fact, have a similarity structure that bears a relationship to the similarity structure of what is being perceived in the world. Of course, there must also be some direct mappings between the real world and its proximal representation. Specifically, a pure Edelman-style similarity model (which represents objects in terms of their similarities to "reference" objects, rather than in a strict decompositional, Biederman-style manner) suffers from a bootstrapping or Quinian problem: it is reliant on some mysterious process which informs it of which dimensions or properties are to be used. Any pair of objects has infinitely many common and distinctive features (Goodman, 1972). How do we know which are to be weighted? There must be some basic built-in assumptions about what is and is not useful; at least some semblance of structured or decompositional representations. These assumptions (as discussed below) by the brain may be created through evolution, development, or even short-term training. In Chapter 3, I showed a new method for analyzing patterns of neural activity, showing that measurement of the additive or subadditive properties of release from neural adaptation can inform as to whether perceptual dimensions of stimuli are represented by conjoint or independent populations of neurons. It is suggested, then, that obtaining an "independent" result for dimensions tells us something about what features the brain more directly decomposes perception onto. In the remainder of this chapter, I will discuss some caveats and inferential limitations of the method and then explore some possible extensions of the method and future research directions employing the technique. 84

Extensions and limitations of the method Since the isomorphism between perceptual and neural similarity structure is likely to be a general organizing principle of the brain, the methods presented in this dissertation could have quite broad applications. Investigation is certainly not limited to the visual modality. For example, it would be quite straightforward to test for conjoint or independent representations of tones varying in pitch and timbre, or of a space of spectral frequency components of speech sounds. Nor are the methods uniquely suited to fMRI data. As long as certain assumptions of linearity hold, the interpretation of additivity or subadditivity applies to any method of measurement of the overlap of neural populations. As mentioned earlier, ERP studies such as those by Caclin et al. (2007) are a perfect candidate for this type of analysis. It is true, however, that appropriate input stimuli must be quite stringently parameterized in order to yield interpretable data. The core of the method depends on stimuli that form a regular perceptual space whose metric properties can be reliably measured and understood. The dimensions of the space need not be continuous, but they must be at least ordinal; e.g., one might use shapes of differing numbers of sides: there is no shape in between a pentagon and a hexagon, but a space may still be formed along this ordinal dimension. For this reason, it is unlikely one would have much success with stimuli which are not as well-characterizable or controllable (note, however, that it is not necessary that the dimensions be easily identifiable; simply that they exist, e.g. be reliably extracted by methods such as MDS). That is, the selected dimensions along which the stimuli are organized must account for as much of the variance in similarity 85

ratings as possible. Any unmodeled variance impairs the inferential strength of the method. Thus, many possible stimulus spaces, especially those organized into categorical conceptual dimensions (e.g., words), may be difficult to norm as precisely as would be needed to obtain good results. This necessity to model all of the variance in the similarity space is related to a more general mathematical property of our covariates: a set of differences which are well fit by a MDS solution in 2 dimensions with the Euclidean metric can also be well fit by one in 3 dimensions with a City-block metric. More generally, any set of distances which are Euclidean in an n-dimensional space may also be described as City-block in some n+1-dimensional space (Borg & Groenen, 2005). If one is not certain that the modeled dimensions describe all the variance, one cannot differentiate between conjoint and independent representations. This ambiguity is always present and must be kept in mind when interpreting the results, including the results presented in Chapter 3 (however, our stimuli are so well characterized on 2 dimensions that I believe this is unlikely to be an issue). Can the method be used to describe the neural representations of more than two dimensions? While the extension of the method to higher dimensions is possible, and even tempting, I would urge caution. The more dimensions tested, the less certain one can be which dimension or dimensions are accounting for variance in the similarity space. So, the inferential uncertainty increases with the number of hypothesized dimensions. A more significant limitation on the applicability of the method has to do with the way that hypotheses about neural representations are typically framed. Generally 86

speaking, the vast majority of possible perceptual dimensions one could construct are unlikely to have a privileged status in the brain. One of the primary goals of cognitive neuroscience is to determine which features or dimensions the brain is organized to represent uniquely. Most often, the burden of proof for an experiment would rest on demonstrating that a particular axis or axes are represented by unique populations of neurons, and thus are independent. Our method, however, is framed to draw inferences in the reverse direction: significant loading on the Euclidean covariate suggests conjointness. Independent dimensions are indicated by additivity as captured by a lack of significant loading on the Euclidean covariate. Such a finding would of course be a null result without a meaningful comparison, and so one can never exclusively test for independence. There are statistical methods that could increase confidence in the reliability of the (null) additivity effect, but these techniques are still fraught with uncertainty and not recommended. In some sense, then, our design is unavoidably backward, though under appropriate circumstances, the relative loading upon the Euclidean covariate may be interpretable. One clear solution (which we employed in Chapter 3) is to always conduct paired experiments in which the expected outcome is conjoint in one case and independent in the other, in the same region of cortex. In this way one can meaningfully test for independent neural populations by directly comparing the loading on the Euclidean covariate between the two experiments. This necessarily introduces the added complication of selecting an appropriate, neurally co-localized, integrally perceived stimulus set to match the stimuli of interest. However, as mentioned above, most


theoretically possible dimensions are conjoint (i.e. non-privileged), so it should be possible to find such stimuli for any independent dimensions one wanted to test.

Future directions I have been assuming the existence of representational dimensions in the brain, without discussing much where these dimensions come from. How do these dimensions arise how does the brain come to organize itself such that populations of neurons become tuned for them? This is a question with a long history, and there are several time scales over which such organization can occur. Selectivity for some kinds of dimensions is certainly innate to the organism. Sensitivity to these dimensions has developed over phylogenetic (evolutionary) time in response to selection pressure to well-represent useful variance and regularities in natural image statistics. Thus over phylogenetic time extending back to the first development, even pre-mammalian, of a complex visual system we expect to find populations of neurons tuned to basic elements like orientation or curvature of edges or spatial frequencies, and are not surprised if we even observe populations tuned to far more complex but highly evolutionarily adaptive stimuli like conspecific faces. Of course, the expression of innate dimensions may also be altered by developmental experience (see Constantine-Paton, 2008, for a review of the landmark studies of neural plasticity by Hubel and Wiesel). I would expect that many nontrivial dimensions are shaped by experience in the first weeks and months of childhood. Selectivity for many other dimensions is, presumably, acquired over the course of development timescales on the order of years, eventually allowing even some extremely unique and narrow tunings (e.g., 88

Quiroga et al., 2005). Less controversially, it seems straightforward that untold billions of neurons are tuned for recognition of wide ranges of everyday objects across varying orientations, acquired through long repeated exposure. More interesting, in the context of the method discussed in Chapter 3, are dimensions that may be formed over very short timescales minutes to days. Humans are able to quickly learn to categorize and organize objects based on their perceptual features, even for objects and object features they may have never encountered before. We can use our method to investigate how such training impacts the tunings of neural populations over short time scales. Any behavioral change in perception must, prima facie, result from changes in neural representation at some level. In the absence of training, it has been shown that there is a relationship between the degree of separability of two dimensions and the independence of neural representation of those dimensions in individual subjects (Kerr et. al, forthcoming). It is also known that explicit training can alter neural object perception; Op de Beeck et. al (2006) showed an increase in response in monkey visual cortex to trained versus untrained objects. However, this tells us little about how those representations were altered. Our method presents the opportunity for within-subject investigations of the effect of training on the independence or conjointness of neural tunings for stimulus dimensions. There are several ways in which the organization of a perceptual space could be altered. For example, one dimension could be exaggerated relative to the other. This could be achieved by sensitive discrimination training on a single dimension, in which case the other dimension would become much less salient, resulting in a perceptual space which is greatly warped (Gauthier et. al, 2003; Goldstone, 1994). We might expect to 89

find that changes on the trained dimension are better represented in the neural pattern, i.e., that they account for more of the correlation between the neural and perceptual spaces. In Chapter 2, we describe a region (lateral LOC) which has a neural correlation pattern highly correlated with RFC-amplitude but not with RFC-phase (Figure 2-5). Might we find a region with this kind of pattern of results, wherein the correlation could be changed with training? Alternatively, integral axes could come to be perceived more separably. Extensively training participants to categorize unfamiliar objects along axes defined by the experimenter can lead them to perceive those dimensions more separably (Hockema et. al, 2005). And, as mentioned previously, individuals who do perceive more separably have more independent neural codings (Kerr et. al, forthcoming). In a within-subjects training paradigm, we might expect to find that changes in perceptual separability are correlated with changes in the loading on the Euclidean contraction covariate in a followup fMRI study. This would suggest that individual neurons in the examined population are changing their tunings to be more sensitive to features of one or the other trained dimension. One caveat, of course, is that we may not always be able to control the locus of the training effect. It may result in cortical retunings outside of visual areas of interest, perhaps because we have altered the response bias rather than the actual percepts; this may be related to the distinction between perceptual and decisional separability (as described in Maddox and Ashby, 1996). Another warping of the perceptual space we might investigate is whether separable axes can come to be perceived more integrally. To my knowledge, this has never been achieved through a training paradigm. However, properties of the stimuli 90

(such as visual degradation) or the task (such as time pressure) can be introduced that alter the behaviorally-recoverable space in a way such that it appears more integral. Will this change in behavior also be reflected in the distribution of neural tunings? Because sources of noise or uncertainty alter the amount of information available to the populations of neurons that are representing things, we might then expect a less independent response. The more noise there is, the less signal there is, and the less a neural population can accurately and precisely encode the position along a dimension. Thus, we could obtain a more apparently conjoint adaptation response. It would appear more conjoint because neurons with broader tunings along all dimensions are also going to be contributing to the response. Altering a representation from separable to integral is, in some sense, a strange goal. The efficiency of representation in the brain is driven by the need to take things that are initially represented conjointly, e.g., high-dimensional input from the retina, and reduce that dimensionality by extracting relevant and recurring features that become what I've been calling dimensions. In the language of this dissertation, this corresponds to more independent neural codings as object representations progress through the ventral visual stream. Thus, instead of investigating effects of training or task, our method could be used more straightforwardly to identify the cortical visual areas in which the same stimuli are represented conjointly versus independently. One could even track the loading on the Euclidean covariate through the visual stream if one had sufficiently strong hypotheses about the particular regions of interest to test. Are some - perhaps more abstract or complex - dimensions represented more separably in later visual areas than others? Although, as I discussed previously, extreme caution must be taken when 91

selecting stimulus sets and framing hypotheses, our method can potentially address a multitude of similar important neuroscientific questions. Conclusion In this dissertation, I have shown that the similarity of patterns of neural activity associated with simple abstract shapes can be related to the similarity of the shapes as consciously perceived. This suggests a cortical organization for object representations that has largely been assumed by many researchers in visual neuroscience, but has rarely been tested so rigorously and explicitly. Further, I have shown that subtle metric properties of these codes may be recovered from fMRI data, even in the face of various troublesome nonlinearities. I presented a method for distinguishing conjoint from independent neural representations using fMRI adaptation, which could potentially have broad applications for studying feature-based versus holistic processing, particularly for investigating effects of learning, experience, and individual differences practically and non-invasively.


Aguirre, G. K. (2007). Continuous carry-over designs for fMRI. NeuroImage, 35, 14801494. Aguirre, G. K., Zarahn, E., & D'Esposito, M. (1998). The variability of human BOLD hemodynamic responses. NeuroImage, 8, 360-369. Arabie, P. (1991). Was Euclid an unnecessarily sophisticated psychologist? Psychometrika, 56, 567587. Arguin, M. & Saumier, D. (2000). Conjunction and linear non-separability effects in visual shape encoding. Vision Research, 40, 3099-3115. Albright, T. D., & Gross, C. G. (1990). Do inferior temporal cortex neurons encode shape by acting as Fourier Descriptor filters? Proceedings of the International Conference on Fuzzy Logic & Neural Networks, 375-378. Attneave, F. (1950). Dimensions of similarity. American Journal of Psychology, 63, 516556. Bartels, A., Logothetis, N.K., & Moutoussis, K. (2008). fMRI and its interpretations: an illustration on directional selectivity in area V5/MT. Trends in Neuroscience, 31(9), 444-453. Biederman, I. (1987). Recognition by components: a theory of human image understanding. Psychological Review, 94, 115-147. 93

Borg, I. & Groenen, P. (2005). Modern multidimensional scaling. New York, NY: Springer. Boynton, G.M., Engel, S.A., Glover, G.H., & Heeger, D.J. (1996). Linear systems analysis of functional magnetic resonance imaging in human V1. Journal of Neuroscience, 16(13), 4207-4221. Buracas, G.T. & Boynton, G.M. (2002). Efficient design of event-related fMRI experiments using M-sequences. NeuroImage, 16, 801813. Caclin, A., Brattico, E., Tervaniemi, M., Ntnen, R., Morlet, D., Giard, M.H.H., & McAdams, S. (2006). Separate neural processing of timbre dimensions in auditory sensory memory. Journal of Cognitive Neuroscience, 18, 19591972. Cant, J.S., Large, M.E., McCall, L., & Goodale, M.A. (2008). Independent processing of form, colour, and texture in object perception. Perception, 37, 57-78. Constantine-Paton, M. (2008). Pioneers of cortical plasticity: six classic papers by Wiesel and Hubel. Journal of Neurophysiology, 99(6): 2741-2744. Cortese, J. M., & Dyre, B. P. (1996). Perceptual similarity of shapes generated from Fourier descriptors. Journal of Experimental Psychology: Human Perception and Performance, 22, 133-143. Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI) "brain reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, 19, 261-270. 94

DeValois, R.L., Albrecht, D.G., & Thorell, L.G. (1982). Spatial frequency selectivity of cells in macaque visual cortex. Vision Research, 22, 545559. Downing, P. E., Jiang, Y., Shuman, M., & Kanwisher, N. (2001). A cortical area selective for visual processing of the human body. Science, 293, 2470-2473. Dumoulin, S. O., & Wandell, B. A. (2008). Population receptive field estimates in human visual cortex. NeuroImage, 39, 647-660. Edelman, S. (1998). Representation is representation of similarities. Behavioral and Brain Sciences, 21, 449-498. Edelman, S., & Intrator, N. (2000). (Coarse coding of shape fragments) + (retinotopy) approximately = representation of structure. Spatial Vision, 13, 255-264. Edelman, S., & Intrator, N. (1997). Learning as formation of low-dimensional representational spaces. Proceedings of the Nineteenth Cognitive Science Society Meeting. Edelman, S., Grill-Spector, K., Kushnir, T., & Malach, R. (1998). Toward direct visualization of the internal shape representation space by fMRI. Psychobiology, 26, 309-321. Eger, E., Ashburner, J., Haynes, J. D., Dolan, R. J., & Rees, G. (2008). fMRI activity patterns in human LOC carry information about object exemplars within category. Journal Cognitive Neuroscience, 20, 356-370.


Engel, S.A. (2005). Adaptation of oriented and unoriented color-selective neurons in human visual areas. Neuron, 45(4), 613-623. Epstein, R., Harris, A., Stanley, D., & Kanwisher, N. (1999). The parahippocampal place area: recognition, navigation, or encoding? Neuron, 23, 115-125. Fang, F., Murray, S.O., Kersten, D., & He, S. (2005). Orientation-tuned fMRI adaptation in human visual cortex. Journal of Neurophysiology, 94, 4188-4195. Friston, K. J., Zarahn, E., Josephs, O., Henson, R. N., & Dale, A. M. (1999). Stochastic designs in event-related fMRI. NeuroImage, 10, 607-619. Furl, N., Rijsbergen, N.J., Treves, A., Friston, K.J., & Dolan, R.J. (2007). Experiencedependent coding of facial expression in superior temporal sulcus. Proceedings of the National Academy of Sciences, 104, 1348513489. Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Lawrence Erlbaum. Garner, W. R., & Felfoldy, G. L. (1970). Integrality of stimulus dimensions in various types of information processing. Cognitive Psychology, 1, 225-241. Gauthier, I., James, T. W., Curby, K. M., & Tarr, M. J. (2003). The influence of conceptual knowledge on visual discrimination. Cognitive Neuropsychology, 20:507-523.


Goldstone, R. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123:178-200. Goldstone, R.L. & Son, J.Y. (2005). Similarity. In K. Holyoak & R. Morrison (Eds.), Cambridge Handbook of Thinking and Reasoning (13-36). Cambridge, UK: Cambridge University Press. Goodman, N. (1972). Problems and projects. Indianapolis, IN: Bobbs-Merrill. Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: a tool for studying the functional properties of human cortical neurons. Acta Psychologica, 107, 293321. Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron, 24, 187-203. Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., & Malach, R. (1998). A sequence of object-processing stages revealed by fMRI in the human occipital lobe. Human Brain Mapping, 6, 316-328. Harris, A., & Aguirre, G. K. (2008). The representation of parts and wholes in faceselective cortex. Journal Cognitive Neuroscience, 20(5), 863-78. Haxby, J. V. (2006). Fine structure in representations of faces and objects. Nature Neuroscience, 9, 1084-1086.


Haxby, J. V., Gobbini, I. M., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293, 2425-2430. Haxby, J. V., Hoffman, E. A., & Gobbini, I. M. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4, 223-233. Henson, R. N. (2003). Neuroimaging studies of priming. Progress in Neurobiology, 70, 53-81. Henson, R. N., & Rugg, M. D. (2003). Neural response suppression, haemodynamic repetition effects, and behavioural priming. Neuropsychologia, 41, 263-270. Haushofer, J., Livingstone, M. S. & Kanwisher, N. (2008). Multivariate patterns in object-selective cortex dissociate perceptual and physical shape similarity. PLoS Biology, 6(7), e187. Hockema S.A., Blair, M.R. & Goldstone R.L. (2005). Differentiation for novel dimensions. In B.G. Bara, L. Barsalou & M. Bucciarelli (Eds.), Proceedings of the Twenty-Seventh Annual Conference of the Cognitive Science Society (953 958). Hillsdale, NJ: Lawrence Erlbaum. Hubert, L., Arabie, P., & Hesson-Mcinnis, M. (1992). Multidimensional scaling in the city-block metric: A combinatorial approach. Journal of Classification, 9, 211 236.


Janssen, P., Vogels, R., & Orban, G.A. (2000). Three-dimensional shape coding in inferior temporal cortex. Neuron, 27, 385-397. Jiang, X., Rosen, E., Zeffiro, T., VanMeter, J., Blanz, V., & Riesenhuber, M. (2006). Evaluation of a shape-based model of human face discrimination using fMRI and behavioral techniques. Neuron, 50, 159-172. Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schlkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods-support vector learning (169184). Cambridge, MA: MIT Press. Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the human brain. Nature Neuroscience, 8, 679-685. Kanwisher, N., Mcdermott, J., & Chun, M. M. (1997). The fusiform face area: a module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17, 4302-4311. Kayaert, G., Biederman, I., Op De Beeck, H., Vogels, R. (2005). Tuning for shape dimensions in macaque inferior temporal cortex. European Journal of Neuroscience, 22(1), 212-224. Kerr, W.K., & Aguirre, G.K. (In preparation). Behavioral measurement of

multidimensional similarity metrics: theoretical and empirical comparison of five methods. Kerr, W.K., Drucker, D.M., & Aguirre, G.K. (In preparation). 99

Kteles, K., de Mazire, P.A., van Hulle, M., Orban, G.A., & Vogels, R. (2008). Coding of images of materials by macaque inferior temporal cortical neurons. European Journal of Neuroscience, 27, 466-82. Kourtzi, Z., Erb, M., Grodd, W., & Blthoff, H.H.(2003). Representation of the perceived 3-D object shape in the human lateral occipital complex. Cerebral Cortex, 13, 911-920. Kourtzi, Z., & Kanwisher, N. (2001). Representation of perceived object shape by the human lateral occipital complex. Science, 293, 1506-1509. Kourtzi, Z., Tolias, A.S., Altmann, C.F., Augath, M., & Logothetis, N.K. (2003). Integration of local features into global shapes: monkey and human fMRI studies. Neuron, 37(2), 333-346. Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H,. Tanaka, K., Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141. Kruskal, J.B., Wish, M. (1978). Multidimensional Scaling (Quantitative Applications in the Social Sciences). SAGE Publications. Lamme, V. A., Rodriguez-Rodriguez, V., Spekreijse, H. (1999). Separate processing dynamics for texture elements, boundaries and surfaces in primary visual cortex of the macaque monkey. Cerebral Cortex, 9(4), 406-413.


Larsson, J., & Heeger, D. J. (2006). Two retinotopic visual areas in human lateral occipital cortex. Journal of Neuroscience, 26, 13128-13142. Livingstone, M. & Hubel, D. (1984). Anatomy and physiology of a color system in the primate visual cortex. Journal of Neuroscience, 4(1), 309-356. Maddox, W. T. & Ashby, F. G. (1996). Perceptual separability, decisional separability, and the identification-speeded classification relationship. Journal of Experimental Psychology: Human Perception and Performance, 27(4): 795-817. Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., et al. (1995). Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proceedings of the National Academy of Sciences, 92, 8135-8139. Mazer, J.A., Vinje, W., McDermott, J., Schiller, P.H., & Gallant, J.L. (2002). Spatial frequency and orientation tuning dynamics in area V1. Proceedings of the National Academy of Sciences, 99, 16451650. McCarthy, G., Puce, A., Gore, J. C., & Allison, T. (1997). Face-specific processing in the human fusiform gyrus. Journal of Cognitive Neuroscience, 9, 605-610. Monnier, P. (2006). Detection of multidimensional targets in visual search. Vision Research, 46, 4083-90. Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: A primer with examples. Human Brain Mapping, 15, 1-25. 101

Nonyane, B. S., & Theobald, C. M. (2008). Design sequences for sensory studies: achieving balance for carry-over and position effects. British Journal of Mathematical and Statistical Psychology, 60, 339-349. Norman, K.A., Polyn, S.M., Detre, G.J., & Haxby, J.V (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Science, 10, 424 430. Op de Beeck, H., Baker, C., Dicarlo, J., & Kanwisher, N. (2006). Discrimination training alters object representations in human extrastriate cortex. Journal of Neuroscience, 26(50): 13025-13036. Op de Beeck, H., Torfs, K., Wagemans, J. (2008). Perceived shape similarity among unfamiliar objects and the organization of the human object vision pathway. Journal of Neuroscience, 28(40), 10111-10123. Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4, 1244-1252. Op de Beeck, H., Wagemans, J, & Vogels, R. (2003), The effect of category learning on the representation of shape: dimensions can be biased but not differentiated. Journal of Experimental Psychology: General, 132, 491-511.


O'Toole, A. J., Jiang, F., Abdi, H., & Haxby, J. V. (2005). Partially distributed representations of objects and faces in ventral temporal cortex. Journal of Cognitive Neuroscience, 17, 580-590. Priebe, N.J., Cassanello, C.R., & Lisberger, S.G. (2003). The neural representation of speed in macaque area MT/V5. Journal of Neuroscience, 23, 5650-5661. Quiroga, Q. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual representation by single neurons in the human brain. Nature, 435(7045):11021107. Radoeva, P. D., Prasad, S., Brainard, D. H., & Aguirre, G. K. (2008). Neural activity within area V1 reflects unconscious visual performance in a case of blindsight. Journal of Cognitive Neuroscience, 20, 1927-1939. Reisbeck, T.E. & Gegenfurtner, K.R. (1999). Velocity tuned mechanisms in human motion processing. Vision Research, 39, 3267-3286. Sary, G., Vogels, R., & Orban, G.A. (1993). Cue-invariant shape selectivity of macaque inferior temporal neurons. Science, 260, 995-997. Sattath, S., & Tversky, A. (1977). Additive similarity trees. Psychometrika, 42, 319-345. Sawamura, H., Orban, G.A., & Vogels, R. (2006). Selectivity of neuronal adaptation does not match response selectivity: a single-cell study of the FMRI adaptation paradigm. Neuron, 49, 307318.


Schwartz, E. L., Desimone, R., Albright, T. D., & Gross, C. G. (1983). Shape recognition and inferior temporal neurons. Proceedings of the National Academy of Sciences, 80, 5776-5778. Schwarzlose, R. F., Swisher, J. D., Dang, S., & Kanwisher, N. (2008). The distribution of category and location information across object-selective regions in human visual cortex. Proceedings of the National Academy of Sciences, 105(11), 4447-4452. Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87. Shepard, R.N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science, 210, 390398. Shepard, R. N., & Arabie, P. (1979). Additive clustering: representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86, 87123. Stankiewicz, B.J. (2002). Empirical evidence for independent dimensions in the visual representation of three-dimensional shape. Journal of Experimental Psychology: Human Perception and Performance, 28, 913-932. Sternberg, S. (1969). The discovery of processing stages: extensions of Donders' method. Acta Psychologica, 30, 276315. Sternberg, S. (1998). Inferring mental operations from reaction-time data: How we compare objects. In D. Scarborough & S. Sternberg (Eds.), An Invitation to 104

Cognitive Science, Volume 4: Methods, Models, and Conceptual Issues (365454). Cambridge, MA: M.I.T. Press. Sternberg, S. (2001). Separate modifiability, mental modules, and the use of pure and composite measures to reveal them. Acta Psychologica, 106, 147246. Summerfield, C., Trittschuh, E. H. H., Monti, J. M. M., Mesulam, M.-M. M., Egner, T. (2008). Neural repetition suppression reflects fulfilled perceptual expectations. Nature Neuroscience, 11, 1004-1006. Tolias, A.S., Keliris, G.A., Smirnakis, S.M., & Logothetis, N.K. (2005). Neurons in macaque area V4 acquire directional tuning after adaptation to motion stimuli. Nature Neuroscience, 8, 591593. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327352. Vazquez, A.L. & Noll, D.C. (1998). Nonlinear aspects of the BOLD response in functional MRI. NeuroImage, 7(2), 108-118. Verhoef, B.E., Kayaert, G., Franko, E., Vangeneugden, J., & Vogels, R. (2008). Stimulus similarity-contingent neural adaptation can be time and cortical area dependent. Journal of Neuroscience, 28(42), 10631-10640. Wandell, B. A., Brewer, A. A., & Dougherty, R. F. (2005). Visual field map clusters in human cortex. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 360, 693-707.


Worsley, K. J., & Friston, K. J. (1995). Analysis of fMRI time-series revisited again. NeuroImage, 2, 173-181. Wyszecki, G., & Stiles, W.S. (2000). Color Science: Concepts and Methods, Quantitative Data and Formulae (Wiley Series in Pure and Applied Optics). Wiley-Interscience. Zahn, C. T., & Roskies, R. Z. (1972). Fourier descriptors for plane closed curves. IEEE Transactions on Computers, C-21, 269-281. Zarahn, E., Aguirre, G. K., & D'Esposito, M. (1997). Empirical analyses of BOLD fMRI statistics. I. Spatially unsmoothed data collected under null-hypothesis conditions. NeuroImage, 5, 179-197.


Appendix A:Relationship to multi-voxel distributed pattern analysis

As a carry-over experiment presents stimuli in a continuous, counter-balanced fashion, it allows the recovery of the average neural response to each stimulus from each voxel, independent of first-order context (Aguirre, 2007). From these measures a distributed neural similarity matrix can be constructed containing the correlations between the patterns of across-voxel responses to each possible pairing of stimuli. Using iterative MDS and related methods (see Discussion), these data may be examined for the metric properties of the distributed neural space, perhaps to support claims of conjoint or independent representation of the stimulus axes. Recently, Schwarzlose et al. (2008) used pattern-correlation to argue that spatial position and object identity are independently represented in several object-selective areas. For this application of multi-voxel techniques, however, it is important to note that any such inferences drawn from the distributed pattern regard the mean responses within a voxel, and not necessarily the tuning of individual neurons within the voxel. This is because the property being sought (independent or conjoint tuning) may be manifest independently at the scale either of individual neuronal tuning or at the scale of spatial arrangement of tuned neurons within voxels. Figure A-1 illustrates readily how populations of neurons with either conjoint or independent tuning can be spatially pooled within voxels such that the across-voxel pattern of activity will be either additive or subadditive in its metric. For example, the lower-left panel illustrates how neurons that are conjointly tuned for a pair of stimulus dimensions may nonetheless be systematically distributed across cortex. Consider neurons that are conjointly tuned for the orientation and spatial frequency of a grating, 107

yet are organized into macroscopic orientation columns without regard to spatial frequency tuning, and in spatial frequency columns without regard to orientation. Such a cortical organization would yield a multi-voxel pattern result that finds voxels that are independently tuned for the two stimulus dimensions, yet the tuning of the neurons themselves is conjoint. Figure A-1 further illustrates how neurons with different metric properties could be heterogeneously organized across voxels to produce all crossings of neural and voxel metrics for conjoint and independent tuning. Formally, therefore, the metric properties of the neural similarity space at the level of individual voxel tuning is orthogonal to the metric properties of distributed neural similarity across voxels. Fortunately, using the carry-over approach described here, both within and across-voxel metric forms may be recovered simultaneously and efficiently in the same experiment.


AppendixB:Proof of additivity and sub-additivity of fMRI adaptation in neural

populations with independent or conjoint tuning THIS APPENDIX WAS CONTRIBUTED BY WESLEY T KERR. We will demonstrate that the within voxel neural adaptation to two perceptual dimensions reflects the Minkowski r relationship between these dimensions for a model neuronal population. Specifically, a change in two perceptually independent dimensions is additive while a change in two perceptually conjoint dimensions is sub-additive. We assume that the adaptation to one stimulus, , is proportional to the

difference in neural response of the current stimulus, A, from the previous stimulus, B, according to the following formula which is motivated by the finding that neural adaptation is proportional to stimulus similarity (Verhoef, 2008): . We use w to signify an arbitrary proportionality constant. Further, we assume that the receptive fields (RFs) of neurons that are tuned to perceptually conjoint dimensions are radially symmetric Gaussians with variance 2. The RFs of neurons tuned to perceptually independent dimensions are assumed to be Gaussians with variance 2 in one dimension and uniform in the other dimension. These Gaussians are tiled orthogonally across the stimulus space. The probability density function of these Gaussians will be notated as G(x). Although we assume Gaussians for notation purposes, any even decay function may be chosen. We will prove our claim in a small example system with four stimuli arranged in a 2 2 square, labeled A through D, and four neurons with RFs tiled so that the maxima 109

of the Gaussians align with the location of the stimuli. Assume that the stimuli are placed a distance, d, apart in stimulus space. adaptation along one perceptual dimension. and are constructed to represent

is constructed to represent adaptation

along both dimensions. This example is easily extended to a larger neuron population without changing the theoretical nature of the system. In the case of independent dimensions, the neural response to each stimulus will be:

From this, it is simple to calculate that . Simple algebra confirms additivity ( ).


In the case of conjoint dimensions, the neural response to each stimulus will be:

The calculate that

term appears due to the radially symmetric RFs. It is then simple to Cancelling the two , which we notice is equal to ), and indeed more so

G(a) terms, we see that . Therefore the system is clearly sub-additive ( than would be suggested by . 110

We have demonstrated above that the within voxel neural adaptation to a change along two perceptually independent dimensions is additive, whereas the within voxel neural adaptation to a change along two perceptually conjoint dimensions is sub-additive.


Appendix C:Measurement of discretized recovery from adaptation

The metric estimation approach described in the body of this chapter uses continuous covariates to model recovery from adaptation for changes along each stimulus dimension. A different analysis scheme may be employed in which a basis set of covariates models the BOLD fMRI signal associated with each transition between one stimulus and the next (120 in total if a symmetric directional effect is assumed). The loading upon these covariates may be used to construct a neural-adaptation similarity matrix, which may then be submitted to MDS or PROSCAL algorithms. See for an example set of data and basis covariates. Here, we consider a post-hoc analysis that may be performed on this basis set of covariates to detect the presence of compressive non-linearities in the stimulus representations and BOLD response. This can be accomplished by directly comparing the stimulus distance and the distances implied by the BOLD fMRI signal in adaptation recovery. To do so, we consider the pure (one-dimensional) changes on each dimension, and measure them with respect to a reference stimulus. The BOLD signal change associated with ever larger transitions is obtained, and used to construct a representation of the degree of difference in BOLD signal between stimuli, related to the degree of difference in the stimuli themselves. There are 16 points in the di-octagon space (Figure C-1A); each position may be assigned a label (1-16). We wish to measure pure changes along a single dimension (k). Distances are measured relative to an origin; we adopt the position along the k dimension occupied by stimuli 7 and 8. We then obtain the BOLD response to transitions between 7 112

15 and 8 9; the mean of these measurements is segment A, which now acts as an anchor point for measurements proceeding rightward. Relative to the origin, A represents the dimension k position of points 6, 15, 9, and 1. The mean BOLD response to transitions 6 5, 15 13, 9 11, and 1 2 distances is then obtained and termed B. Segment C is provided by the transitions 13 4 and 11 3. The segment AB is provided by 7 13 and 8 11, the segment BC by 15 4 and 9 3, and the segment ABC by 7 4 and 8 3. The pure dimension k transition 16 12 is not usable as it cannot be related to the origin. Using these values we may then plot four points: (i) the fixed origin at zero, representing the dimension k position of points 7 and 8; (ii) the dimension k position of points 9 and 15 at A; (iii) the dimension k position of points 5, 13, 11, and 2 at A+B and at AB; and (iv) the dimension k position of points 4 and 3 at A+BC, AB+C, and ABC. The values obtained for these points may then be compared to the positions of the stimuli along the k axis in the original stimulus space. The analogous process may be followed for interrogation of the orthogonal dimension j. Figure C-1B shows the resulting relationship between stimulus position and constructed BOLD response for a model di-octagonal space without any distortion, and for one in which a logarithmic transform has been applied along the k dimension. Goodness-of-fit measures of linear and non-linear models to actual data may be used to test for the presence of a non-linearity in the integrated response (across neural representation and BOLD transform) across a dimension.


Table 3-1: Average loading on the Euclidean contraction covariate, scaled by the loading on the City-block model covariates, for a simulation of independent and conjointly tuned neural populations.
distortion of neural space original linear sigmoid quadratic log sigmoid 2 quadratic 2 log 2 original linear sigmoid quadratic log sigmoid 2 quadratic 2 log 2 original linear sigmoid quadratic log sigmoid 2 quadratic 2 log 2 original linear sigmoid quadratic log sigmoid 2 quadratic 2 log 2 rotate 22.5 rotate 45 distortion of hemo-dynamic transform none (linear) none (linear) none (linear) none (linear) none (linear) none (linear) none (linear) none (linear) sigmoid sigmoid sigmoid sigmoid sigmoid sigmoid sigmoid sigmoid quadratic quadratic quadratic quadratic quadratic quadratic quadratic quadratic log log log log log log log log none (linear) none (linear) Euclidean contraction (scaled) conjoint independent 1.4 1.2 0.9 1.3 1.3 0.5 1.3 1.4 1.4 1.0 1.0 1.3 1.2 0.7 1.3 1.4 1.4 1.2 0.9 1.3 1.4 0.5 1.3 1.4 1.5 1.0 0.9 1.2 1.0 0.5 1.1 1.3 1.2 1.2 0.0 0.0 -0.3 0.0 0.1 -0.5 0.0 0.3 0.0 0.0 0.0 0.03 0.1 -0.2 0.03 0.4 0.0 0.0 -0.3 0.0 0.1 -0.5 0.0 0.3 0.1 0.0 0.0 0.1 0.2 -0.2 0.1 0.4 1.3 1.8


Average loading upon the Euclidean contraction covariate, scaled by the loading upon the City-block model covariates, for a simulation of independent and conjointly tuned neural populations. The average was obtained over 100 different, counterbalanced sequences that were selected for having high combined Efficiency for the both the City-block and Euclidean contraction effects. The distortion of neural space was applied to a single stimulus dimension unless noted as 2, in which case it was applied to both stimulus dimensions (Figure 3-3B). Indicated in bold are those cases where a positive loading upon the Euclidean contraction covariate was obtained despite assuming independent neural populations tuned to the two stimulus axes, raising the possibility of improper bias.


Figure 2-1: Stimuli

Sixteen unfilled, closed contours, constructed from radial frequency components (RFCs). The amplitude and phase of frequency 6 was varied; the amplitude and phase of frequencies 2 and 4 were held constant at 0.50 radians and 0 degrees. All other frequency components were zero.


Figure 2-2: Stimulus presentation

A sequence of contours and blank trials were shown for 1400 ms each, separated by 100 ms inter-trial-intervals. The stimulus on each trial was randomized to appear in either a red or purple hue on a gray background. The subject was instructed to indicate on each trial, by button press, the color of the stimulus.


Figure 2-3: Focal pattern similarity

(A) The average across subjects of the amplitude of linear recovery from adaptation associated with stimulus transitions in ventral and lateral LOC. (B) Random-effects analysis of recovery from adaptation, linearly related to perceptual shape similarity. Warm colors indicate significant recovery from adaptation associated with stimulus change. (C) In a four by four configuration of stimuli, six (city-block) step sizes are possible. In ventral but not lateral LOC, increasing step sizes are associated with increasing BOLD response (zero on the Y axis corresponds to the average response 118

across all stimulus transitions). Greater error in higher step sizes is due to fewer possible exmplars of those changes. Error bars reflect variability across subjects. The R-value of the best linear fit is given for each region.


Figure 2-4: Distributed pattern similarity

(A) The average across subjects of the correlation of the distributed neural similarity matrix with the stimulus similarity matrix is shown. The distributed pattern within lateral LOC was significantly more correlated than in ventral LOC. (B) Average w-map indicating the amount of variation (informativeness) in each voxels activity due to change in stimulus. The map from each subject was z-transformed across voxels, spatially smoothed, and then the average across subjects obtained. (C) Shown are stimulus and neural similarity matrices. The color of each cell reflects how similar a 120

given pair of stimuli are as measured by behavioral responses (left) or distributed neural codes (right). The diagonal of the matrix is undefined for identical pairs of stimuli. Shown below the two neural similarity matrices is the correlation of the entire neural matrix with the stimulus similarity matrix. Ventral LOC carries little information about similarity in the distributed pattern, and is therefore weakly correlated with the perceptual matrix. By contrast, in lateral LOC, the similarity of the distributed patterns associated with each stimulus pair is strongly and linearly related to their perceptual similarity, yielding strong correlation with the perceptual matrix. The diagonal symmetry of the matrices is enforced by the nature of their construction.


Figure 2-5: Similarity by stimulus axis

(A) The studied shapes were defined along two axes. Variations in RFC-amplitude were perceived as changes in complexity or smoothness; variations in RFC-phase were perceived as changes in orientation (Cortese & Dyre, 1996). (B) RFC-amplitude and RFC-phase are equally responsible for the adaptation effect in ventral LOC. (C) In lateral LOC, RFC-amplitude is primarily responsible for the distributed pattern. (D) The stimulus similarity matrix can be decomposed into the portion of similarity measured along the RFC-amplitude axis and the RFC-phase axis. The sum of the two stimulus 122

similarity matrices shown on the left is equal to the complete stimulus similarity matrix shown in Figure 4C. The stimulus RFC-amplitude matrix is highly correlated with distributed neural similarity in lateral LOC, while there is only a weak relationship with RFC-phase.


Figure 2-6: Voxel tuning and proximate vs. distant adaptation

(A) The average BOLD fMRI response in a single voxel (shown inset) to each of the 16 stimuli. Data points are indicated by black dots and connected by a red surface. The shape which elicited the largest response is indicated in yellow. (B) The proportion of voxels within each region of interest which responded maximally to each of the 16 different shapes. (C) The average response of voxels to stimuli that differ from the stimulus that elicited the largest response, expressed as a proportion of the range of 124

response. Black data points and the red surface are the average across subjects, and the transparent blue surfaces are SEM across subjects. The x and y axis are expressed in units of steps along the stimulus axes away from the maximum stimulus. The central value is omitted and had an obligatory value of unity. (D) For the response profile shown in panel A, the stimulus indicated in yellow elicited the maximum response on average across all presentations. We might consider that the neural adaptation associated with small stimulus transitions that involve this stimulus (red) will differ from that measured for stimulus transitions that do not include this stimulus (black). The top figure illustrates these different transitions in the stimulus space, while the bottom figure shows the transitions in the context of a similarity matrix. (E) The neural adaptation associated with small stimulus transitions was measured in the lateral and ventral LOC. Shown is the across subject average difference in adaptation between transitions that were proximate to, or distant from (red and black in panel D, respectively), the center of tuning for each voxel.


Figure 3-1: Conjoint and independent population codes

(A) An example stimulus space, defined by changes in color and shape of twodimensional contours. (B) A population of idealized neurons with receptive fields conjointly tuned to stimulus color and shape. Each neuron responds optimally to a particular position in the stimulus space, and its response drops off as a function of distance from that point, regardless of which dimension is changed. (C) A population of idealized neurons independently tuned to respond to a particular color or a particular shape. Each neuron responds optimally to a particular position in one dimension of the 126

stimulus space, and its response drops off as a function of distance from that point along one dimension only, but is not responsive to changes in the other dimension.


Figure 3-2: Construction of covariates

In an example carry-over experiment a continuous sequence of stimuli are presented. The fMRI data that result are modeled with three covariates. The first two City-block covariates model the change from one stimulus to the next in color and in shape. The third covariate models the difference between the rectilinear, City-block distance between stimuli and the sub-additive, Euclidean distance.


Figure 3-3: Simulation of non-linearities and distortions

(A) A simulation of the test for conjoint and independent populations. The model begins with a set of stimuli with regular spacing. A population of neurons represent these stimuli through the pattern of their firing, and we model the similarity of the patterns of firing for stimulus pairs. The distance between neural patterns for the different stimuli can be linearly related to the physical distance between the stimuli, or distorted along either 129

dimension in a linear or non-linear manner. A particular sequence of stimuli indicates a series of transitions between stimuli and a magnitude of neural activity modulated by adaptation based upon the overlap of neural population response to the two stimuli. The aggregate neural response may be linearly related to the overlap between the population responses, or modulated by a non-linear function. The vector of neural responses is transformed to a BOLD fMRI signal by convolution with a standard hemodynamic response function. The letters superimposed on the arrows reference the non-linearities presented in the other panels of this figure. (B) Depictions of the neural distances between stimuli resulting from the linear and non-linear distortions of the stimulus space examined in the model. (C) Non-linearities in the transformation of neural activity to BOLD fMRI response. Three different non-linearities are considered, each covering the theoretical maximum range of neural firing for a region (blue curves). Neural adaptation generally produces a small (20%) modulation of signal strength (superimposed red segment). We modeled the worst-case hemodynamic non-linearity in which the neural adaptation modulation occurs within the maximally non-linear component of each function.


Figure 3-4: Effect of model rotation

(A) A population of neurons may represent a stimulus space with two sets of linear receptive fields, but these may have an orientation that is rotated with respect to the dimensional axes that define the stimuli; in this case by 22.5. (B) This circumstance may be detected by repeating the analysis and assuming a rotated version of the actual stimulus space. (C) For a population of neurons with conjoint tuning, the loading on the Euclidean contraction covariate is unaltered with model rotation. Given independent tuning, in contrast, model rotation alters the Euclidean contraction beta value, with a value of zero reached when the assumed rotation of the model matches the actual alignment of the receptive fields. Two independently-tuned curves are shown, corresponding to experimental designs that have either 16 or 49 samples of the stimulus 131

space. The negative dip of the independently tuned function with 16 samples under rotation is therefore shown to be an artifact of discreet sampling of a continuous stimulus space.


Figure 3-5: Effect of Minkowski index

(A) Performance of the model for a range of Minkowski exponent values defining the neural population. A red plus marks where the Euclidean contraction covariate has a value of zero for a Minkowski exponent of one, corresponding to independent neural populations. (B) The curves from panel A, expressed as the ratio of the Euclidean contraction to the City-block effect.


Figure 3-6: The di-octagon space

(A) An example color stimulus space defined by hue and brightness with grid-spaced sampling. (B) A di-octagonal sampling of the same stimulus space.


Figure 3-7: Efficiency over sequence permutations

The Efficiency (Friston 1999) of n=17, type 1 index 1 counterbalanced sequences was calculated for the Euclidean contraction and City-block model covariates. 50000 label permutations (Aguirre 2007, Appendix A) were tested. Blue circles indicate the sequences with the highest Efficiency for one or both covariates.


Figure 3-8: Stimuli

(A) A set of two-dimensional, closed contours defined by variations in radial frequency contours. The dimensions that define the stimulus space are perceptually integral. (B) A set of closed contours that vary along two identifiable dimensions (curvature and thickness). These dimensions are perceptually separable.


Figure 3-9: Garner speeded-classification

We used the Garner task to confirm that the popcorn and moon stimulus axes are perceived as integral and separable, respectively. (A) Subjects were required to sort serially presented stimuli into two categories, divided along one of the axes. In the 'filtering' condition, the other axis was varied randomly, while in the 'correlated' condition, the value of the other axis always followed the value of the axis to sort on (thus providing additional information). (B) Subjects were faster sorting popcorn shapes in the 'correlated' condition than in the 'filtering' condition; there was no difference between the conditions for sorting the moon shapes. These patterns of results are described as belonging to 'integral' and 'separable' axes (Garner & Felfoldy 1970).


Figure 3-10: Stimulus presentation

(A) Each shape was drawn on a gray background. A line, randomly tilted between 10 and 40 degrees from vertical, divided the shape such that 65% of the surface area fell to one side or the other. A space was kept between the line and the shape such that they would not intersect. The subject was instructed to indicate on each trial, by button press, whether the line was drawn more to the right (B) or left (C) of the shape.


Figure 3-11: Neuroimaging results

(A) The location of voxels showing adaptation for all four stimulus dimensions (the two popcorn dimensions and the two moon dimensions) across 6 subjects. The data are displayed atop an inflated, ventral cortical surface. (B) The signal change associated with linear recovery from adaptation for the four stimulus dimensions within the selected voxels, averaged across subjects. Because of the method of voxel selection, some signal change is guaranteed. The figure illustrates the comparable degree of response for all four 139

dimensions. (C) Significant loading upon the Euclidean contraction covariate was observed across 6 subjects within the selected voxels for the popcorn stimulus space, suggesting that the two stimulus dimensions are not represented by independent neural populations. There was no loading upon the same measure for the moon stimulus space, thus failing to reject independent representation. (D) The possibility of independently tuned, but rotated, stimulus dimensions was tested for the popcorn space by examining the Euclidean contraction effect under assumed rotations of the stimulus space. The function never reached zero, confirming non-independence of the neural representation. (E) The recovery from adaptation for discretized stimulus changes along a single stimulus axis was obtained across subjects for each dimension of the stimulus space. A notably linear relationship was observed, arguing against a compressive non-linearity as the cause of the finding of conjoint representation. The value at the first stimulus position was fixed at zero. See Appendix C for details.


Figure A-1: Comparison of within- and across-voxel tunings

Four hypothetical arrangements of neurons within voxels. For each of the cases, we consider four voxels, each voxel containing several neurons. Shown is the stimulus space for each voxel, with the receptive fields of the neurons contained within the voxel plotted as circles (conjointly tuned) or bars (independently tuned). As can be readily appreciated, voxels with conjoint or independent tuning can be constructed with neurons with either conjoint or independent tuning. Consequently, the metric property of neural similarity calculated across voxels is not necessarily related to the metric tuning properties of the neurons within voxels. 141

Figure C-1: Linearity of fMRI response

(A) The di-octagonal popcorn stimulus space is shown. Below is the standard numbering and positions of the stimuli along the two dimensions, and with a logarithmic transformation applied to one axis. (B) The relationship between aggregate BOLD response and stimuli may be examined in an attempt to detect non-linearities in the neural representation and/or hemodynamic transform. Shown are the stimulus transitions that contribute measurements of the BOLD fMRI response, which are then used to construct the graph shown. The solid, linear relationship is for simulated responses in which a linear mapping of stimulus space to neural adaptation and to BOLD response is produced. The dashed line indicates the simulated result for the same measurement when a logarithmic transform has been applied to the neural representation along dimension k. 142


29 June 2009