You are on page 1of 2

C O M M E N TA RY

F O C U S O N B I G D ATA

Putting big data to good use in neuroscience


Terrence J Sejnowski, Patricia S Churchland & J Anthony Movshon

Big data has transformed fields such as physics and genomics. Neuroscience is set to collect its own big data sets, but to
exploit its full potential, there need to be ways to standardize, integrate and synthesize diverse types of data from different
levels of analysis and across species. This will require a cultural shift in sharing data across labs, as well as to a central role
for theorists in neuroscience research.

Big data, the buzz phrase of our time, has but rarely if ever in a broad behavioral context. a single technology is not trivial, making
arrived on the neuroscientific scene, as it has Different techniques differ also in concepts meaningful causal relationships among data
© 2014 Nature America, Inc. All rights reserved.

already in physics, astronomy and genom- and vocabularies, in background assumptions sets obtained with very different technologies
ics. It offers enlightenment and new depths and experimental norms. Decision-making, even more difficult to achieve.
of understanding, but it can also be a bane if for example, might be studied at the level Second, different animal models are used
it obscures, obstructs and overwhelms. The of populations of single-cell recordings in to study different problems: flies, worms, fish,
arrival of big data also marks a cultural tran- ­monkeys or by fMRI in humans or by lesions mice, rats, monkeys and humans all have their
sition in neuroscience, from many isolated in rats or by molecular and optical techniques place. It is often unclear how to extrapolate
‘vertical’ efforts applying single techniques in mice. These differences mean that standard- from worm data to a mammalian nervous
to single problems in single species to more ization in neuroscience must be made relative system, for example, or from in vitro prepara-
‘horizontal’ efforts that integrate data collected to a technique and that cross-level and cross- tions to in vivo preparations. Each model has
using a wide range of techniques, problems technique data integration cannot e­ asily be its distinct virtues, and new efforts to integrate
and species. We face five main issues in mak- automated. Standardizing data collected with information across species and technologies
ing big data work for us.
First, data in neuroscience exist at an aston- 1 0 00 ,00
0
00 01 1 0 00 ,00 0,0 00
ishing range of scales of both space and time. 0.0 0.0 0.0 0.1 1 10 10 1,0 10 10 1,0
Neuroscientific data are obtained from a wide 1,000
000 2014 PET imaging 1,000
1, 000
range of techniques, from patch clamping to Brain
EEG and MEG
optogenetics to fMRI (Fig. 1). Most of these Lobe 100
0 100
10
00
techniques are used one at a time. One lab will
npg

Map TMS
record spikes from an array of neurons, but not 10
0
VSD
10
0
ima
imag
agging
imagingg fM
fM
MRR
RI
fMRI Brain
be able to determine which types of neurons imag
im ging
imaging
Nucleus 2
2-DG lesions
they are or how they are connected to other Mi
Micro
Micr o timula
osti mulati
l tion
Microstimulation
1 im
imaging 1
Size (mm)

neurons. Another lab will reconstruct the Layer Optogenetics


O
Optoge
Opto
pt ge
gene
enetics
eneti
­wiring diagram of the same circuit, but with-
Siz

0.1
1 Lig
ght microscopy
Light miccrrossco
opyy 0.1
out recording data to identify the properties Field
d potentials
of the reconstructed neurons. In some heroic Neuron
0.01
0.01
1 Single
Sing
ngle
le units
u
unnits
nits 0 01
0.01
1
cases, functional data have been laboriously
combined with anatomical reconstructions1, Dendrite Patch clam
mp
clamp
0.001
0.0
001
1 0.001
0
0.00
.0
0
0001

Synapse Calcium imaging Electron


n microscopy
Electron microssco
s opy
Terrence J. Sejnowski and Patricia S. Churchland
are at the Howard Hughes Medical Institute, 0.0001 0.0001
0
0.0
0001

the Salk Institute for Biological Studies, La Jolla, 00


1 1 .01 0.1 1 10 0 00 0 00 00
.00 0 10 1,0 00 00 00 1988
0.0 0 10 10 100
California, USA. Terrence J. Sejnowski is also in Time (s)
the Division of Biological Sciences, University of Millisecond Second Minut
Minute
tte
e Hourr Day Month

California at San Diego, La Jolla, California, USA, Figure 1 The spatiotemporal domain of neuroscience and of the main methods available for the
and Patricia S. Churchland is in the Department of study of the nervous system in 2014. Each colored region represents the useful domain of spatial
Philosophy, University of California at San Diego, and temporal resolution for one method available for the study of the brain. Open regions represent
measurement techniques; filled regions, perturbation techniques. Inset, a cartoon rendition of
La Jolla, California, USA. J. Anthony Movshon is at
the methods available in 1988, notable for the large gaps where no useful method existed9. The
the Center for Neural Science, New York University, regions allocated to each domain are somewhat arbitrary and represent our own estimates. EEG,
New York, New York, USA. electroencephalography; MEG, magnetoencephalography; PET, positron emission tomography; VSD,
e-mail: terry@salk.edu voltage-sensitive dye; TMS, transcranial magnetic stimulation; 2-DG, 2-deoxyglucose.

1440 VOLUME 17 | NUMBER 11 | NOVEMBER 2014 NATURE NEUROSCIENCE


C O M M E N TA R Y

may pay off handsomely. But this will require ­uring extended, behavioral experiments,
d analysis, and can be used by other researchers
a deepened appreciation of comparative and to identify the neurons recorded from, to who want to ask different questions. This is not
evolutionary neurobiology. ­reconstruct the circuit that gave rise to the an easy process and requires a level of plan-
It has been said that “nothing in neuro- activity, and to relate the combined data ning and quality control that goes beyond most
science makes sense except in the light of to behavior—all in the same individual. exploratory experiments that are undertaken in
behavior”2. Traditionally, neuroscientists Although this may seem like a pie-in-the-sky most laboratories8. Here again, a modest cul-
have restricted the range and richness of ­experiment, it is within reach in some species, tural change can make a large impact.
­behavioral measurements to keep the collec- such as the nematode worm Caenorhabditis Fifth, at some point along the Baconian rise
tion and interpretation of correlated data from elegans, whose neuronal connectivity is already of ever larger and more complex data sets, a
neurons manageable. This strategy constrains known, and the transparent larval zebrafish, deeper understanding should emerge from the
our understanding of how the brain supports where it is possible to record ­simultaneously accumulated knowledge, as it has in other areas
the full range of behaviors. Big data is mak- from most of its 100,000 or so neurons. To of science. What we have today is a lot of small
ing it possible to record from the same set of accomplish these ambitious goals will take models that encompass limited data sets. These
neurons while the subject engages in a much teams of closely coordinated researchers with models are more descriptive than explanatory.
richer set of behaviors. Behavioral research complementary expertise. Theory has been slow in coming. One obstacle
will greatly benefit from the application of Fourth, as data sets grow and become more is that sometimes theorists do not clearly con-
machine learning techniques that allow fully complex, it will become more and more dif- vey what they propose, perhaps because they
automated analysis of behavior in freely mov- ficult to analyze and extract conclusions. In seek safety in needlessly complex mathemat-
ing animals3–5. The challenge is to discover the the worst case scenario, the data may not be ics or because they are too remote from the
causal relationships between big neural data reducible to simpler descriptions. Here we experimental base to undergird their theoreti-
and big behavioral data. need to rely on new approaches to analyz- cal ideas. Any of these issues can detract from
© 2014 Nature America, Inc. All rights reserved.

Third, as things stand in neuroscience, inte- ing data in high-dimensional spaces using productive ideas. This can change.
gration of functional data is mainly tackled by pattern-searching algorithms that have been What we contemplate are modest cultural
individual labs and by those with whom they developed in statistics and machine learning. changes, wherein some neuroscientists are
collaborate. Such a strategy of ‘every tub on To illustrate, consider the project of mainly theorists, with appropriate grant sup-
its own bottom’ depends on individuals to Vogelstein et al.7, whose aim was to under- port to make the research feasible. The term
absorb information, communicate with oth- stand in Drosophila larvae the causal role of “theorist” enjoys an uneven reputation in neu-
ers in the same subfield, and otherwise keep each of 10,000 neurons in producing a simple roscience, but serious scholars with this port-
up. Meetings, lab visits, publications, review behavior in the animal’s repertoire, such as folio do now exist, although they tend to be in
articles and so forth have been the mainstay turning or going backwards. Drawing on over short supply. We need to cultivate a new gen-
of this form of integration. Although power- 1,000 genetic lines and using optogenetic tech- eration of computationally trained researchers
ful and productive and a source of innovation, niques to stimulate individual neurons in each who are aware of the richness of data and can
this style has limits. With increases in numbers line, they generated a basic data set consisting draw on knowledge from many laboratories,
of laboratories and publications, it is hard for of correlations between stimulated identified courageous enough to make judicious simpli-
individuals to keep up with the latest technol- neurons and a behavioral output. (Notice that fications and to have their ideas tested, and
ogy and harder still to keep data from slipping the data set would have been far more mas- imaginative enough to generate interesting,
into oblivion, including data whose signifi- sive had they stimulated neurons two or three testable large-scale ideas.
cance can be appreciated only later when the or more at a time.) To find patterns in their
COMPETING FINANCIAL INTERESTS
science catches up with the technology. This huge accumulation of correlational data, they
npg

The authors declare no competing financial interests.


will require a cultural shift in the way that data fed the data to an unsupervised learning pro-
are shared across labs. gram, which yielded a potential understand- 1. Bock, D.D. et al. Nature 471, 177–182 (2011).
Note too that this kind of integration is ing of links between neurons and behavior. 2. Shepherd, G.M. Neurobiology. 8 (Oxford Univ. Press,
essentially vertical, in the sense that it is largely Correlational data could enhance understand- 1988).
3. Dankert, H., Wang, L., Hoopfer, E.D., Anderson, D.J. &
directed toward one particular problem, going ing of the connectional structure to address Perona, P. Nat. Methods 6, 297–303 (2009).
up and down the organizational levels on that questions of circuitry. Nevertheless, the meth- 4. Falkner, A.L., Dollar, P., Perona, P., Anderson, D.J. &
problem. Horizontal integration of data across odological significance of the project is that is Lin, D. J. Neurosci. 34, 5971–5984 (2014).
5. Wu, T. et al. IEEE Trans. Syst. Man Cybern. B Cybern.
a range of problems—for example, learning, shows how new tools can be put to work to 42, 1027–1038 (2012).
decision-making, perception, emotion and find patterns in data obtained from networks of 6. National Institutes of Health. BRAIN 2025: a scien-
tific vision. http://www.nih.gov/science/brain/2025/
motor control—is even harder to achieve in neurons, patterns that emerge only from using (2014).
one laboratory. There is just too much data for new analytic tools on very large data sets. 7. Vogelstein, J.T., Park, Y. & Ohyama, T. Science 344,
one laboratory to get its collective head around. The statistical design of these experiments 386–392 (2014).
8. Mountain, M. Phys. Today 67, 8–10 (2014).
A goal of the BRAIN Initiative6 is to record will be critical to insure that data sets are care- 9. Churchland, P.S. & Sejnowski, T.J. Science 242,
and manipulate a large number of neurons fully calibrated, are of sufficient power to admit 741–745 (1988).

NATURE NEUROSCIENCE VOLUME 17 | NUMBER 11 | NOVEMBER 2014 1441

You might also like