You are on page 1of 15

VolPy: automated and scalable analysis pipelines for voltage

imaging datasets
Changjia Cai1 , Johannes Friedrich2 , Eftychios A Pnevmatikakis2 , Kaspar Podgorski3* ,
Andrea Giovannucci1,4*

1 Joint Department of Biomedical Engineering at University of North Carolina at


Chapel Hill and North Carolina State University, Chapel Hill, NC, USA
2 Flatiron Institute, Simons Foundation, New York, NY, USA
3 Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
4 Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC,
USA

* agiovann@email.unc.edu(AG); podgorskik@janelia.hhmi.org(KP)

Abstract
Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular
scale, unlocking the study of subthreshold activity, synchrony, and network dynamics
with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s)
and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we
present VolPy, an automated and scalable pipeline to pre-process voltage imaging
datasets. VolPy features motion correction, memory mapping, automated segmentation,
denoising and spike extraction, all built on a highly parallelizable, modular, and
extensible framework optimized for memory and speed. To aid automated segmentation,
we introduce a corpus of 24 manually annotated segmentation datasets from different
preparations and voltage indicators. We benchmark VolPy against ground truth
segmentation and electrophysiology recordings, demonstrating VolPy’s performance in
neuron localization, spike extraction, and scalability.

Author summary
Roughly 290 million electrical action potentials occur every second in the human brain,
facilitating the propagation of signals among cells in the nervous system and driving
most of our daily operations. New methods in brain imaging are emerging that have the
speed and resolution to capture events in the brain at the pace at which neurons
typically communicate. These methods measure voltage in neurons by using light, and
therefore can access very detailed brain signaling patterns. However, the adoption of
these methods by a larger community, and not a restricted set of experts, is limited by
the lack of computational tools, thereby greatly hindering progress in this field. In this
paper, we present a set of tools, which we call VolPy, that greatly facilitate the
preprocessing of this new imaging datasets. This pipeline incorporates very efficient and
optimized algorithms that can identify neurons and extract their activity with great
accuracy. The presented software will make this new imaging modality accessible to a
wide audience.

April 22, 2020 1/15


Introduction 1

While methods have been developed to process voltage imaging data at mesoscopic scale 2

and multi-unit resolution [1–3], to date there is no established pipeline for large-scale 3

cellular-resolution analyses. Voltage indicators have only recently become sensitive 4

enough to generate large-scale recordings at cellular resolution [4–10]. Approaches 5

developed for calcium imaging fail with this valuable but different new imaging 6

modality: dataset sizes have increased one or two orders of magnitude (Tens of GBs vs 7

TBs per hour), and assumptions about underlying signals made by calcium imaging 8

analysis methods can be violated. For instance, established non-negative matrix 9

factorization (NMF) methods [11, 12] are not efficient for voltage imaging [13]: (i) 10

segmentation approaches developed for somatic calcium imaging fail for other 11

modalities, (ii) voltage imaging signals contribute little variance and are not well 12

isolated by existing NMF approaches; (iii) voltage signals display both positive and 13

negative fluctuations and significant multiplicative noise may arise from light absorption 14

in one-photon voltage recordings, which may make the NMF framework, which is based 15

on non-negativity in both spatial and temporal domains, not readily apply. To address 16

these challenges, efficient algorithms must be developed and validated for multiple 17

aspects of voltage imaging data processing, including cell segmentation, denoising, and 18

spike detection. Ideally, these methods should form a user- and developer- friendly, 19

modular, and computationally efficient pipeline, to facilitate optimization of algorithms 20

and future development. 21

Algorithms specific for processing voltage imaging datasets are beginning to 22

populate the literature. For instance, [5] introduces iterative methods for spike 23

detection, which detect lower-amplitude spikes using filters generated from 24

easily-detected high-amplitude spikes (SpikePursuit). However, this method requires 25

manual or semi-manual selection of neurons. Second, the algorithm is not optimized for 26

parallel processing. Finally, it is not embedded into a reusable and well documented 27

data format, an important requirement for broad community use [14]. A more 28

standardized approach is provided by [6, 13, 15]. These methods employ penalized 29

matrix decomposition to denoise and demix signals, while relying on localNMF to 30

perform an initial segmentation. However, these frameworks do not embed an adaptive 31

and automated mechanism for spike extraction and are not integrated into a robust, 32

scalable and multi-platform framework. Further, while these approaches have been 33

validated with simulated data and small electrophysiological ground truth datasets, the 34

lack of both electrophysiological and segmentation ground truth has hindered the 35

validation and comparison of all these approaches. In summary, no validated, complete, 36

scalable and automatic analysis pipeline for voltage imaging data analysis exists to date. 37

To address these shortcomings, we established objective performance evaluation 38

benchmarks and a new analysis pipeline for pre-processing voltage imaging data, which 39

we named VolPy. First, in order to establish a common validation framework and 40

automate neuron segmentation, we created a corpus of annotated datasets with 41

manually segmented neurons. Second, we trained a supervised algorithm to 42

automatically localize and segment cells via convolutional networks [16]. Third, we 43

developed an improved algorithm to denoise fluorescence traces and extract spikes, 44

which built upon SpikePursuit [5]. Fourth, we quantitatively evaluated performance of 45

VolPy on neuron segmentation, spike extraction and scalability. Fifth, we integrated our 46

methods within the CaImAn ecosystem [12], a popular suite of tools for single cell 47

resolution brain imaging analysis. In addition to computational advantages provided by 48

CaImAn’s optimized tools, this enabled immediate adoption by the research labs 49

already relying on the CaImAn ecosystem. 50

April 22, 2020 2/15


Materials and methods 51

Creation of a corpus of annotated datasets 52

To date there are no established metrics to compare the practical performance of 53

algorithms for single cell localization and/or segmentation in cellular-resolution voltage 54

imaging datasets. Toward establishing such a benchmark, and with the goal of 55

developing new supervised algorithms, we generated a corpus of annotated datasets 56

(Ground truth, GT) in which neurons were manually segmented. GT was constructed 57

by human labelers from two summary images (mean and local correlation images, 58

Fig 1b,c) and a pre-processed movie that highlights active neurons (local correlation 59

movie, S1 Vid). More specifically, after motion correction, we generated a mean image, 60

a correlation image and a correlation video as follows: 61

a
MOTION MEMORY TRACE DENOISING &
MOVIE SEGMENTATION
CORRECTION MAPPING SPIKE EXTRACTION

b c d

e f 0

0 0
1 1
2 2
3 3
4 4
5 5 10
6 78 6 78
9 9
10 11 10 11
12 12
13 14 13 14
16 15 16 15
181917 181917
21 20 21 20
22 23 22 23 20
60um 60um

1au
0 0.3 1s 400ms
Fig 1. Analysis pipeline for voltage imaging data. (a) Four pre-processing steps
for segmenting neurons and extracting spikes from voltage imaging movies. (b)
Correlation image (front) and mean image (back) as the input of the segmentation step.
(c) The segmentation step outputs class probabilities, bounding boxes and contours.
The results are overlaid to the correlation image in (b). (d) Results of trace denoising
and spike extraction. The gray dashed horizontal line represents the inferred spike
threshold. (e) Correlation image (left) and mean image (right) of one mouse Layer1
neocortex dataset with contours detected by VolPy. (f) Temporal traces with detected
spikes corresponding to neurons in panel (e) extracted byVolPy (left). The dashed gray
portion of the traces is magnified on the right.

Mean image: To compute the mean image, we average the movie across time for each 62

April 22, 2020 3/15


pixel and normalize by the pixel-wise z-score. 63

Correlation image: The correlation image is a variation of that implemented in [17]. 64

After removing the baseline of the movie by low-pass filtering, we average the temporal 65

correlation of each pixel with its eight neighbor pixels. We also normalize the 66

correlation image by z-scoring when fed to the neural network. 67

Correlation movie: We introduced a novel type of representation method, the 68

correlation movie, which was essentially a running version of the correlation image 69

computed over overlapping chunks of video frames. This new type of visualization 70

significantly improved the visibility of spikes in voltage imaging movies (see S1 Vid). 71

Guided by these three visual cues, two annotators marked the contours of neurons 72

using the ImageJ Cell Magic Wand tool plugin [18](see S1 Fig). For neurons to be 73

included in the corpus, both annotators had to agree on the selection, which fulfilled the 74

following criteria: (i) Neurons were very clear on at least one of the three cues; (ii) 75

Neurons were moderately clear in one of the summary images and exhibited a spatial 76

footprint in selected frames of the local correlations movie. Summary information about 77

the annotated datasets is reported in Tab 1. 78

Table 1. Properties of three heterogeneous types of datasets. For each type of


dataset the name, organism, brain region, source, imaging rate, voltage indicator, and
the total number of neurons selected by the manual annotators are given.
Name Organism Brain region Source Rate (Hz) Indicator # neurons
L1 Mouse L1 cortex [5] 400 Voltron 523
TEG Zebrafish Tegmental [5] 300 Voltron 107
HPC Mouse Hippocampus [6] 1000 paQuasAr3-s 41

A novel analysis pipeline for voltage imaging 79

We propose a novel scalable pipeline for automated analysis that performs the 80

preprocessing steps required to extract spikes and sub-threshold activity from voltage 81

imaging movies (Fig 1). First, input data are processed to remove motion artifacts with 82

parallelized algorithms, and saved into a memory mapping file format that enables 83

efficient concurrent access. In a second stage, VolPy segments candidate neurons using 84

supervised algorithms (Fig 1a,c) or a manual annotation tool. Finally, VolPy denoises 85

fluorescence traces, infers spatial footprints, detects single spikes, and extracts 86

subthreshold activity of each neuron (Fig 1a,d). In Fig 1e,f, we report the result of 87

running the full pipeline on an example of mouse L1 neocortex voltage imaging dataset. 88

S2 Vid shows the reconstructed movie based on the result of VolPy on the same mouse 89

L1 neocortex dataset. In what follows we present each stage of the VolPy pipeline in 90

detail. 91

Motion correction and memory mapping 92

First, movies are corrected for sample movement and stored in an efficient format by 93

relying on the infrastructure provided by CaImAn [12, 19]. CaImAn provides routines to 94

simultaneously register frames to a template and creates memory-mapped files. Memory 95

mapping provides the ability to quickly read arbitrary portions of the movie in any 96

direction without loading the full movie into memory. In turn, this allows parallelization 97

of all the operations which are required to generate summary images and denoise the 98

signal. 99

April 22, 2020 4/15


Segmentation 100

The low SNR of voltage imaging data hinders the applicability of the segmentation 101

methods previously devised for calcium imaging data [11]. Here we propose to segment 102

neurons with supervised learning approaches. While previous attempts at cell 103

localization and segmentation have extended U-Net fully convolutional network 104

architectures [20], in our hands this family of methods fail when facing datasets in 105

which neurons overlap heavily (TEG datasets). So instead we approach the problem 106

with Mask R-CNN, a convolutional network for object localization and 107

segmentation [16]. Mask R-CNN is a particularly promising architecture as it enables 108

separation of overlapping objects in a specific area by providing each object with a 109

unique bounding box (Fig 2a). More in details, Mask R-CNN provides simultaneous 110

object localization and instance segmentation via a combination of two network 111

portions: backbone and head. The backbone features a pre-trained convolutional 112

network (such as VGG, ResNet, Inception or others [21]) for feature extraction. Mask 113

R-CNN also exploits another effective backbone: feature pyramid networks [22], a 114

top-down architecture with lateral connections, which enables the network to extract 115

features on multiple scales from the feature maps. In the head, based on the extracted 116

features, a Region Proposal Network proposes initial bounding boxes for each candidate 117

object, which are fed to two downstream branches. One branch is trained to predict a 118

class label and a bounding box offset which refines the initial bounding box, while the 119

other branch outputs a binary mask for each candidate object. An example of the 120

network inference on a validation dataset by VolPy is shown in Fig 1e. 121

We adapted Mask R-CNN to our purpose as follows. We chose a combination of 122

ResNet-50 pre-trained on the COCO dataset and feature pyramid networks as the 123

backbone. The input of the network is a three channel image: two for the mean images 124

and one for the correlation image. Three channels are required to match the input to the 125

first few layers pre-trained on the COCO dataset. The network was trained to predict 126

only one class, neuron or not neuron (background) instead of a multi-label output. 127

During training, we randomly cropped the input image into 128x128 crops and 128

applied the following data augmentation techniques using the imgaug [23] package: flip, 129

rotation, multiply (adjust brightness), Gaussian noise, shear, scale and translation. 130

Each mini-batch contained six cropped images. We trained on one GPU the head (the 131

whole network except the ResNet) of the network for 2000 iterations with learning rate 132

0.01 and then trained layers after the first three stages of the ResNet (28 layers) for 133

another 2000 iterations with learning rate 0.001. We used stochastic gradient descent as 134

our optimizer with a constant learning momentum 0.9. The weight decay was 0.0001. 135

For validation, images were padded with zeroes to make width and height multiples 136

of 64 so that feature maps could be smoothly scaled for the Feature Pyramid Network. 137

We only selected neurons with confidence level greater or equal to 0.7. 138

While VolPy segmentation method achieves good performance on similarly collected 139

datasets, we do not expect it to generalize to completely new datasets out of the box. 140

To overcome this issue, we developed a manual annotation visual interface tool within 141

VolPy. This enabled one-click semiautomatic neuron segmentation based on the Python 142

cell magic wand tool [24]. See S3 Vid for more detail. 143

Trace denoising and spike extraction 144

Classical algorithms for denoising calcium imaging movies and extracting spikes from 145

the corresponding fluorescence traces fail when applied to voltage imaging movies. On 146

the one hand, the low signal-to-noise ratio and the complex background fluorescence 147

require new methods for refining spatial footprints, and on the other hand, substantially 148

different biophysical models underlie the temporal dynamics of the fluorescence 149

April 22, 2020 5/15


a class box
6
conv 77 0 .90
2.9
0. 0.9 0 9
. 90.50 0 .8
00
77 83 91
0. .92 0. .4.1730.
00 5
71 0 .8 .88
0. 0 0
feature 69 0
.9 0
0.
extraction ROI align 83
conv 0.
6 0
0.

mask

0.86

1 3 remove background 4 5
extract movie detect prominent compute
by SVD and
in context region spikes by thresholding spike template
Ridge regression

2 7 6 8 extract
optimize spatial filter apply whitened matched
high-pass filter sub-threshold
by Ridge regression filter, detect spikes again
activity

High

0.5s Low

c neuron #1 neuron #2 neuron #3 t


ts
trec
tsub
spikes
0.5s

Fig 2. Segmentation, trace denoising and spike extraction framework. (a)


Mask R-CNN framework for neuron segmentation. The network predicts a class label, a
bounding box and a binary mask for each candidate neuron taking summary images as
inputs (mean and correlation). (b) Algorithm for fluorescence trace denoising and spike
extraction. ○-1 ○ 2 Load and high-pass filter the signal in one context region of the
movie. The initial temporal trace is computed either by averaging ROI pixels or by
applying the spatial filter (if available) to the context region. Two steps are executed in
loop for three iterations. The former (○- 3 ○)6 estimates spike time, and the latter (○) 7
refines the spatial filter. ○
3 Extract the first 8 principal components of the background
pixels using SVD and then remove the background contamination via Ridge regression.
○4 The background-removed trace (t) is high-pass filtered to obtain a zero baseline
trace ts for further processing. Spikes with peaks larger than 3.5 times noise level (gray
dotted line) are selected. ○ 5 Waveforms of these spikes (gray) are averaged to obtain a
spike template (black line). ○ 6 A whitened matched filter is used to enhance spikes akin
to the template. For a second time, peaks are detected by thresholding ts based on the
noise level. The reconstructed signal (trec ) is obtained by convolving the template
waveform and the inferred spike train. ○ 7 Refine spatial filter through Ridge regression.
The product of the context region across time with the refined spatial filter generates
the temporal trace for the following iteration. ○ 8 After three iterations, the
subthreshold activity(tsub ) is extracted by applying a low-pass filter on the residual
trace (t − trec ). (c) Examples of t, trec , ts , and tsub traces, along with detected spikes.

April 22, 2020 6/15


associated to spikes. To solve both problems, we built upon and extended the 150

SpikePursuit (SP) algorithm [5]. We improved SP in the following directions: (i) While 151

the original version of the algorithm required manual selection of candidate neurons, 152

VolPy automatically initializes it using the output of the trained Mask R-CNN; (ii) A 153

minimal amount of data were loaded in memory thanks to the memory mapping 154

infrastructure, thereby reducing memory requirements; (iii) We increased the reliability 155

of the algorithm, by introducing a more robust estimate of the background, spikes and 156

subthreshold activity; (iv) We scaled up the performance by improving the underlying 157

optimization algorithms. In what follows, we introduce SP (see Fig 2b), along with our 158

modifications to improve the algorithm. Psuedocode of the algorithm is shown in S1 159

Text. 160

ROI loading and preprocessing: As a result of segmentation, each candidate 161

neuron has an associated binary mask which represents its spatial extent (ROI region 162

R). The ROI is dilated to get a larger region centered on the neuron (context region C). 163

Background pixels are defined as all the pixels in the context region at least nB pixels 164

(12 pixels by default) away from the ROI region (background region). As a first step, all 165

pixels in the context region are efficiently retrieved (Y ) from the memory mapping file 166

and high-pass filtered (Yh ) to compensate for photo-bleaching (○ 1○


2 in Fig 2b, see 167

psuedocode in S1 Text for more detail). The initial temporal trace t associated to the 168

neuron can be approximated either as the mean signal of the ROI region pixels, or -- if 169

a spatial filter w previously calculated were available -- as the weighted average across 170

all pixels in the context region : 171

( 1 P
n(r) Yh [:, x] if w is not given
t= x∈r (1)
Yh w if w is given
where r represents pixels in ROI region R, n(r) represents the total number of pixels in 172

the ROI region. 173

Afterwards, two steps are executed in loop for three rounds. The former estimates 174

spike times, and the latter refines the spatial filters. 175

Spike time estimation: In order to estimate spike times from the fluorescence traces 176

(○
3○ 4○ 5○6 in Fig 2b) we first compute the singular value decomposition of the 177

background pixels Yb : 178

Yb = U ΣV (2)
where U contains the temporal components. This is then used to remove background 179

contamination via Ridge regression, in which Ub , the first npc components (default is 8) 180

of U is the regressor and the temporal trace is the predictor (○ 3 in Fig 2b): 181

β = (UbT Ub + λb kUb k2F I)−1 UbT t (3)

t = t − Ub β (4)
The idea behind equation 3 is to isolate from the trace the portion of signal which is 182

background only. We added an L2 regularizer to penalize large regression coefficients 183

caused by high signal-to-noise ratio neurons in the background pixels. This provided 184

more reliable results compared to the original linear regression method. 185

The background-removed trace t is then filtered with a high-pass filter (1 Hz cut-off) 186

to obtain ts which is further processed for spike extraction (○ 4 in Fig 2b). A gaussian 187

low-pass filter is then applied to ts to prevent spikes associated with multiple noisy 188

peaks. Two rounds of spike detection are subsequently performed. The first round 189

selects high SNR spikes. To increase robustness of spike detection when few spikes 190

April 22, 2020 7/15


appear in the signal, we replaced the adaptive thresholding method in SP with a 191

thresholding method selecting peaks larger than 3.5 times the noise level. The noise 192

level is estimated by computing standard deviation of the trace, but only considering 193

value lower than its median(see psuedocode in S1 Text for more detail). A spike 194

template z is computed by averaging all the peak waveforms: 195

ns
1 X
z= ts [s[i] − τ : s[i] + τ ] (5)
ns i=1

where s is the list of spike time, ns is the total number of spikes, τ is the half size of 196

window length (○ 5 in Fig 2b). Subsequently, a whitened matched filter [25] is used to 197

enhance spikes with shape similar to a template (○ 6 in Fig 2b). More in details, we use 198

the Welch method to approximate the spectral density of the noise in the fluorescence 199

signal. Second, we scale the signal in the frequency domain to whiten the noise. Finally, 200

we convolve with a time-flipped template, which is approximated using the 201

peak-triggered average. 202

The latter round of spike detection selects spikes larger than 4 (adjustable 203

parameter) times the new noise level. Then, a reconstructed and denoised trace is 204

computed by convolving the inferred spike train (q) with the waveform template: 205


1 if there is a spike at time t
trec = q ∗ z where q[t] = (6)
0 otherwise
Spatial filter refinement: The second step is to refine the spatial filter (○
7 in 206

Fig 2b). The updated spatial filter is computed by Ridge regression, where the 207

reconstructed and denoised trace trec is used to approximate the high-passed movie: 208

w = (YhT Yh + λw kYh k2F I)−1 YhT trec (7)


The ridge regression problem is solved with an iterative and efficient algorithm [26] 209

implemented in the Scikit-Learn package (’lsqr’). Subsequently, the weighted average of 210

the movie with the refined spatial filter is used as the updated temporal trace for the 211

following iteration: 212

t = Yh w (8)
After three iterations of spike time estimation and spatial filter refinement, 213

subthreshold activity is extracted(○ 8 in Fig 2b). This is achieved by applying a low-pass 214

filter (75Hz cutoff by default) on the residual trace obtained by subtracting the 215

reconstructed fluorescence trace from t. 216

Evaluation of VolPy 217

Precision/Recall Framework to measure segmentation performance 218

In order to measure the performance of VolPy segmentation, we compared the spatial 219

footprints extracted by VolPy with our manual annotations (see [12] component 220

registration for a detailed explanation). In summary, we computed the Jaccard distance 221

(the inverse of intersection over union) to quantify similarity among ROIs, and then 222

solved a linear assignment problem with the Hungarian algorithm to determine matches 223

and mismatches. Once those were identified, we adopted a precision/recall framework 224

and defined True Positive (TP), False Positive (FP), False Negative (FN), and True 225

Negative (TN) as follows: 226

April 22, 2020 8/15


TP = number of matched spatial footprints
FP = number of spatial footprints in VolPy but not in GT
(9)
FN = number of spatial footprints in GT but not in VolPy
TN = 0
Next we computed precision, recall and F1 score of the performance in matching as 227

the following: 228

Precision = TP/(TP + FP)


Recall = TP/(TP + FN) (10)
F1 = 2 × Precision × Recall/(Precision + Recall)
The F1 score is a number between zero and one. Better performance results in higher F1 229

score. 230

Cross-Validation to evaluate the segmentation performance on limited 231

datasets 232

In order to decrease the selection bias originated from the separation in training and 233

validation datasets and better evaluate Mask R-CNN model on our limited datasets (24 234

in total), we performed a stratified three-fold cross-validation. The reason we used a 235

stratified three-fold cross-validation rather than a normal three-fold cross-validation is 236

that we wanted to maintain an equivalent portion of training and evaluation samples for 237

each dataset type. In more details, we partitioned our datasets into three folds so that 238

L1, TEG, and HPC would all maintain the same number of samples without repetition 239

(Tab 2 train/val column shows one group of the partition). During cross-validation two 240

groups were used as training sets while the remaining one as validation set. The 241

cross-validation process was repeated three times with each group used exactly once as 242

validation set. 243

Table 2. All annotated datasets for segmentation of VolPy. For each dataset
the name, size of datasets and number of labeled neurons are given.
Name Size # Name Size #
L1.00.00 20000*512*128 84 HPC.29.04 20000*164*96 2
L1.01.00 20000*512*128 53 HPC.29.06 20000*228*96 2
L1.01.35 20000*512*128 69 HPC.32.01 20000*256*96 4
L1.02.00 20000*512*128 61 HPC.38.05 20000*176*92 4
L1.02.80 20000*512*128 43 HPC.38.03 20000*128*88 2
L1.03.00 20000*512*128 79 HPC.39.07 20000*264*96 5
L1.03.35 20000*512*128 57 HPC.39.03 20000*276*96 5
L1.04.00 20000*512*128 43 HPC.39.04 20000*336*96 4
L1.04.50 20000*512*128 34 HPC.48.01 20000*224*96 2
TEG.01.02 10000*364*320 33 HPC.48.05 20000*212*96 4
TEG.02.01 10000*360*256 29 HPC.48.07 20000*280*96 2
TEG.03.01 10000*508*288 45 HPC.48.08 20000*284*96 3

For each run of the cross-validation process, we trained a single network and tested 244

it on both training and validation sets. We then computed the mean and standard 245

deviation of the F1 score for different types of datasets with training and validation sets 246

treated separately. 247

April 22, 2020 9/15


Spike matching performance 248

In order to match spikes extracted from simultaneous voltage imaging and 249

electrophysiology datasets, we performed the basic operations (spike deletion, spike 250

insertion, spike shift) transforming one spike train into the other, that are used to 251

calculate the Victor-Purpura distance [27]. We used a cost M = 25 for adding/removing 252

a spike and cost 1 of shifting a spike per unit of time. We minimized the total cost using 253

a Hungarian algorithm. Spikes that ended up shifted to align with each other were 254

considered a match. After identifying matches and mismatches, we proceeded similarly 255

to what explained above. We defined TP, FP, FN, TN similar to Equation 9: 256

TP = number of matched spikes


FP = number of spikes in VolPy but not in GT
(11)
FN = number of spikes in GT but not in VolPy
TN = 0
Then we calculated the F1 score same as Equation 10. 257

Results 258

In what follows we report a systematic evaluation of VolPy against ground truth in 259

terms of performance in identifying neurons, spike extraction and scalability. 260

VolPy localizes neurons using a moderate amount of training 261

data 262

We trained a modified version of the Mask R-CNN network architecture on three 263

heterogeneous types of datasets (Tab 1) and evaluated its performance using 3-fold cross 264

validation (see Tab 2 and Materials and Methods for details). In Fig 3a, we compared 265

the contours predicted by VolPy with manual annotations on three example datasets: 266

VolPy was able to identify candidate neurons even in conditions of low signal-to-noise 267

and spatial overlap. We quantified VolPy performance in detecting neurons using a 268

precision/recall framework, which accounted for the amount of overlap between 269

predicted and ground truth neurons when assigning matches and mismatches. In Fig 3b 270

and Tab 3 we summarized the F1 score for all the probed datasets. The results indicate 271

that our segmentation approach performs well provided sufficient neurons are fed to 272

train the algorithm. Indeed, VolPy obtained F1 scores of 0.89 ± 0.01 on the L1 dataset 273

(532 neurons in total), 0.71 ± 0.02 on the TEG datasets (107 neurons), and 0.46 ± 0.07 274

on the HPC dataset (39 neurons). In case of TEG, the performance of VolPy is fair 275

considering that the network was trained with only two datasets of this type. In the 276

HPC datasets however the performance on both training and test sets is relatively 277

inferior. We hypothesize that this is due to the fact that not enough data are available 278

(see #neurons column in Tab 1), possibly combined with the low signal to noise typical 279

of this dataset type. 280

VolPy detects with fidelity single spikes from voltage imaging 281

data 282

We validated the VolPyspike extraction algorithm on three voltage imaging datasets in 283

which electrophysiology was simultaneously recorded with voltage imaging. We 284

automatically analyzed voltage imaging data recordings from mouse L1 neocortex and 285

Zebrafish Tegmental area [5] with the VolPy pipeline. The outputs were the spatial 286

footprints, the voltage traces, and the corresponding spike time(Fig 3e-g). Spikes for 287

April 22, 2020 10/15


a L1 HPC TEG b 1.0
L1
TEG
GT HPC
0.8
MRCNN

F1 score
0.6
0.4

matches
0.2
0.0
datasets

c train
val
0.8

F1 score
0.6
0.4
0.2
0.0
L1 TEG HPC
mismatches

d 1.0

F1 score
0.9

0.8

0.7 Fish1
Fish2
0.6 Mouse
datasets
e Fish1 f Fish2 g Mouse

300Hz 300Hz 500Hz


Ephys
VolPy
1s 10s 1s
h processing time allocation i parallelization speed j peak memory
(FOV size 512*128) (10GB, 75 neurons) (10GB, 75 neurons)
1500 25
500 motion corection
mem mapping peak memory (GB)
segmentation 1250
400 20
time (s)

spike extraction
time (s)

1000
300 15
750
200
500 10
100 250
0 5
0
1 2 4 1 2 4 8 1 2 4 8
frames (104) number of processors number of processors
Fig 3. Evaluation of VolPy performance. (a) Evaluation of segmentation against
three manually annotated datasets including mouse sensory cortex, mouse hippocampus,
and larval zebrafish. Matched and mismatched neurons between VolPy and ground
truth are shown in upper and bottom panels respectively. (b) F1 scores of VolPy for all
evaluated datasets. (c) Average F1 score on training and validation sets grouped by
dataset type. (d)-(g) Evaluation of VolPy spike extraction performance against
simultaneous electrophysiology. For each dataset in (e),(f) and (g), electrophysiology
(top, blue) and fluorescence signal denoised by VolPy (bottom, orange) are reported.
Spikes are matched between the two groups with F1 score computed in (d). (h)-(j)
Evaluation of time and memory performance of VolPy. Processing time of VolPy with
different number of frames and processors are shown in (h) and (i). Peak memory usage
of VolPy with different number of processors is shown in (j).

April 22, 2020 11/15


Table 3. VolPy performance on segmentation. For each type of datasets,
number of datasets, number of neurons, recall, precision, F1 score for training and
validation computed by stratified cross-validation are provided.
Name #datasets #neurons recall(%) precision(%) F1 (%)
train/val train/val train/val train/val train/val
L1 6/3 349 ± 7/174 ± 7 86 ± 4/85 ± 3 94 ± 2/95 ± 1 90 ± 2/89 ± 1
TEG 2/1 71 ± 7/36 ± 7 70 ± 3/67 ± 2 83 ± 7/77 ± 3 76 ± 4/71 ± 2
HPC 8/4 26 ± 1/13 ± 1 88 ± 11/66 ± 7 55 ± 7/40 ± 12 65 ± 8/46 ± 7

electrophysiology recordings were obtained by thresholding. Spikes were matched 288

against ground truth by solving a linear sum assignment problem using the Hungarian 289

algorithm. The F1 score of each dataset (Fig 3d) was computed relying on a 290

precision/recall framework based on matched and unmatched spikes. We observed that 291

VolPy performed well on all datasets: the F1 score across three datasets was 0.95 ± 0.03 292

and the F1 score for each dataset was above 0.9. This confirms that single spikes from 293

voltage imaging data can be automatically extracted with fidelity. 294

VolPy enables the analysis of large voltage imaging datasets on 295

small and medium sized machines 296

We examined the performance of VolPy in terms of processing time and peak memory 297

for the datasets presented above. We ran our tests on a linux-based desktop (Ubuntu 298

18.04) with 16 CPUs (Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz) and 64 GB of 299

RAM. For segmentation, we used a GeForce RTX 2080 Ti GPU with 11 GB of RAM 300

memory. 301

Fig 3h reports the VolPy processing time in function of the number of frames. The 302

results show that the processing time scales linearly in the number of frames. 303

Processing 75 candidate neurons in a 1.6 minutes long movie (512*128 pixel FOV) takes 304

about 8 minutes. Spike extraction (red bar) accounts for most of the processing time. 305

In order to probe the benefits of parallelization, we ran VolPy 4 times on the same 306

hardware while limiting the runs to 1, 2, 4 and 8 CPUs respectively (Fig 3i). We 307

observed significant performance gains due to parallelization, especially in the motion 308

correction and SpikePursuit phase, with a maximum speed-up of 3X. Simultaneously, we 309

recorded the peak memory usage of VolPy while running on a different number of CPUs 310

for each run. Fig 3j shows how the peak memory increases with the number of threads. 311

Therefore, VolPy enables speed gains by trading-off execution time for memory usage. 312

Discussion 313

Recording subthreshold voltage changes in populations of neurons is necessary to dissect 314

the details of information processing in the brain. Voltage imaging is currently the only 315

technique that promises to achieve this goal with high spatio-temporal resolution. Here 316

we introduced VolPy, the first end-to-end, automatic, scalable, and open source analysis 317

pipeline for large scale voltage imaging datasets. Compared to previous approaches, 318

VolPy advances the state-of-the-art in the following ways. First, VolPy automatically 319

corrects movies for motion artifacts and generates memory-efficient data files. Second, it 320

detects both active and inactive neurons in the FOV via a supervised algorithm. Third, 321

it provides trace denoising and adaptive spike time estimation based on automatic, 322

robust and scalable peak detection algorithms. Finally, VolPy is embedded into the 323

April 22, 2020 12/15


popular open-source software CaImAn, and therefore directly available to the existing 324

community. 325

To evaluate generalization performance of the segmentation algorithm, we trained a 326

single neural network on three dataset types simultaneously. When provided with 327

sufficient training data, we observed good generalization performance across the trained 328

dataset types. However, segmentation of new dataset types will likely require additional 329

training data to achieve similar performance. For this reason, we provide an interface 330

within VolPy to manually label neurons and produce new annotations. In conjunction, 331

we provide a step-by-step guide to train Mask R-CNN neural networks with newly 332

labelled datasets 333

(https://github.com/flatironinstitute/CaImAn/wiki/Training-Mask-R-CNN). As more 334

data will become available and more users will adopt VolPy, we plan to develop a 335

web-based graphical user interface for experimentalists to manually label datasets and 336

transfer the resulting annotations to a distributed computing server, which will 337

periodically retrain the network and improve the performance of our system. Currently, 338

only a small number of labs have publicly shared cellular-resolution large-scale voltage 339

recordings, and we have collected most of them in this corpus. However, we anticipate 340

that these methods will rapidly become more widely adopted, resulting in a much wider 341

variety of dataset types. 342

In the future, we plan to extend this framework in two algorithmic directions. First, 343

similar to calcium imaging [28], we plan to develop methods appropriate to real-time 344

scenarios, where activity of neurons needs to be inferred on-the-fly and frame-by-frame; 345

Second, we plan to include algorithms appropriate to quantify supra- and subthreshold 346

signals in neuronal sub-compartments. 347

Availability 348

The code and datasets are available at https://github.com/flatironinstitute/CaImAn 349

and https://www.dropbox.com/sh/5pu8g0vz5hl0mpq/AABHU3RC81hKnWAqu- 350

e26xP8a?dl=0 (link will be available on Zenodo upon 351

publication). 352

Supporting information 353

S1 Vid. Example of voltage imaging data on mouse neocortex data. Left: 354

movie after motion correction. Right: local correlation movie. 355

S2 Vid. Example of reconstructed movie in VolPy.Left: movie after motion 356

correction. Middle: movie removed from baseline. Right: reconstructed movie. 357

S3 Vid. Example of manual annotation interface in VolPy. This is used to 358

annotate datasets quickly when there are few neurons in FOV. 359

S1 Fig. Manual annotations of voltage imaging datasets with ImageJ. We 360

selected neurons based on mean image (left), correlation image (mid-left) and local 361

correlation movie (mid-right). Two annotators marked the contours of neurons using 362

ImageJ Cell Magic Wand tool plugin and saved selections in ROI manager (right). 363

S1 Text. Algorithmic details for trace denoising and spike extraction in 364

VolPy. 365

April 22, 2020 13/15


Acknowledgments 366

We thank K Svoboda, A Singh, M Ahrens, T Kawashima, Y Shuai, A Cohen, M Xie for


providing voltage imaging datasets. We thank H Eybposh for useful discussions. We
thank A Singh for the initial version of Python code, and for insightful comments and
discussions.

References
1. Marshall JD, Li JZ, Zhang Y, Gong Y, St-Pierre F, Lin MZ, et al. Cell-type
specific optical recording of membrane voltage dynamics in freely moving mice.
Cell. 2016;167(6):1650–1662.e15. doi:10.1016/j.cell.2016.11.021.
2. Carandini M, Shimaoka D, Rossi LF, Sato TK, Benucci A, Knöpfel T. Imaging
the Awake Visual Cortex with a Genetically Encoded Voltage Indicator. Journal
of Neuroscience. 2015;35(1):53–63. doi:10.1523/JNEUROSCI.0594-14.2015.

3. Akemann W, Mutoh H, Perron A, Park YK, Iwamoto Y, Knöpfel T. Imaging


neural circuit dynamics with a voltage-sensitive fluorescent protein. Journal of
Neurophysiology. 2012;108(8):2323–2337. doi:10.1152/jn.00452.2012.
4. Knöpfel T, Song C. Optical voltage imaging in neurons: moving from technology
development to practical tool. Nature Reviews Neuroscience. 2019; p. 1–9.

5. Abdelfattah AS, Kawashima T, Singh A, Novak O, Liu H, Shuai Y, et al. Bright


and photostable chemigenetic indicators for extended in vivo voltage imaging.
Science. 2019;365(6454):699–704. doi:10.1126/science.aav6416.
6. Adam Y, Kim JJ, Lou S, Zhao Y, Xie ME, Brinks D, et al. Voltage imaging and
optogenetics reveal behaviour-dependent changes in hippocampal dynamics.
Nature. 2019;569(7756):413.
7. Kannan M, Vasan G, Huang C, Haziza S, Li JZ, Inan H, et al. Fast, in vivo
voltage imaging using a red fluorescent indicator. Nature methods.
2018;15(12):1108.
8. Piatkevich KD, Bensussen S, Tseng Ha, Shroff SN, Lopez-Huerta VG, Park D,
et al. Population imaging of neural activity in awake behaving mice in multiple
brain regions. bioRxiv. 2019; p. 616094.
9. Piatkevich KD, Jung EE, Straub C, Linghu C, Park D, Suk HJ, et al. A robotic
multidimensional directed evolution approach applied to fluorescent voltage
reporters. Nature chemical biology. 2018;14(4):352.

10. Roome CJ, Kuhn B. Simultaneous dendritic voltage and calcium imaging and
somatic recording from Purkinje neurons in awake mice. Nature communications.
2018;9(1):3388.
11. Pnevmatikakis E, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, et al.
Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data.
Neuron. 2016;89(2):285–299. doi:10.1016/j.neuron.2015.11.037.
12. Giovannucci A, Friedrich J, Gunn P, Kalfon J, Brown BL, Koay SA, et al.
CaImAn an open source tool for scalable calcium imaging data analysis. eLife.
2019;8:e38173. doi:10.7554/eLife.38173.

April 22, 2020 14/15


13. Buchanan EK, Kinsella I, Zhou D, Zhu R, Zhou P, Gerhard F, et al. Penalized
matrix decomposition for denoising, compression, and improved demixing of
functional imaging data. arXiv:180706203 [q-bio, stat]. 2018;.
14. Teeters JL, Godfrey K, Young R, Dang C, Friedsam C, Wark B, et al. Neurodata
Without Borders: Creating a Common Data Format for Neurophysiology.
Neuron. 2015;88(4):629–634. doi:10.1016/j.neuron.2015.10.025.
15. Xie M, Adam Y, Fan L, Boehm UL, Kinsella IA, Zhou D, et al. High fidelity
estimates of spikes and subthreshold waveforms from 1-photon voltage imaging in
vivo. bioRxiv. 2020;.
16. He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the
IEEE international conference on computer vision; 2017. p. 2961–2969.
17. Smith SL, Häusser M. Parallel processing of visual space by neighboring neurons
in mouse visual cortex. Nature Neuroscience. 2010;13(9):1144–1149.
doi:10.1038/nn.2620.
18. Walker T. Cell magic wand tool. Cell Magic Wand Tool; 2014.
19. Pnevmatikakis EA, Giovannucci A. NoRMCorre: An online algorithm for
piecewise rigid motion correction of calcium imaging data. Journal of
Neuroscience Methods. 2017;291:83–94. doi:10.1016/j.jneumeth.2017.07.031.
20. Falk T, Mai D, Bensch R, Cicek O, Abdulkadir A, Marrakchi Y, et al. U-Net:
deep learning for cell counting, detection, and morphometry. Nature Methods.
2019;16(1):67–70. doi:10.1038/s41592-018-0261-2.
21. Agarwal T, Mittal H. Performance Comparison of Deep Neural Networks on
Image Datasets. In: 2019 Twelfth International Conference on Contemporary
Computing (IC3). IEEE; 2019. p. 1–6.
22. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid
networks for object detection. In: Proceedings of the IEEE conference on
computer vision and pattern recognition; 2017. p. 2117–2125.
23. Jung AB, Wada K, Crall J, Tanaka S, Graving J, Yadav S, et al.. imgaug; 2019.
Available from: https://github.com/aleju/imgaug.
24. Apthorpe N, Riordan A, Aguilar R, Homann J, Gu Y, Tank D, et al. Automatic
neuron detection in calcium imaging data using convolutional networks. In:
Advances in Neural Information Processing Systems; 2016. p. 3270–3278.
25. Franke F, Quian Quiroga R, Hierlemann A, Obermayer K. Bayes optimal
template matching for spike sorting – combining fisher discriminant analysis with
optimal filtering. Journal of Computational Neuroscience. 2015;38(3):439–459.
doi:10.1007/s10827-015-0547-7.
26. Paige CC, Saunders MA. LSQR: An algorithm for sparse linear equations and
sparse least squares. ACM Transactions on Mathematical Software (TOMS).
1982;8(1):43–71.
27. Victor JD, Purpura KP. Metric-space analysis of spike trains: theory, algorithms
and application. Network: computation in neural systems. 1997;8(2):127–164.
28. Giovannucci A, Friedrich J, Kaufman M, Churchland A, Chklovskii D, Paninski
L, et al. Onacid: Online analysis of calcium imaging data in real time. In:
Advances in Neural Information Processing Systems; 2017. p. 2381–2391.

April 22, 2020 15/15

You might also like