Cai Plos VolPy PDF

VolPy: automated and scalable analysis pipelines for voltage
imaging datasets
Changjia Cai1 , Johannes Friedrich2 , Eftychios A Pnevmatikakis2 , Kaspar Podgorski3* ,
Andrea Giovannucci1,4*
1 Joint Department of Biomedical Engineering at University of North Carolina at

Chapel Hill and North Carolina State University, Chapel Hill, NC, USA
2 Flatiron Institute, Simons Foundation, New York, NY, USA
3 Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
4 Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC,
USA
* agiovann@email.unc.edu(AG); podgorskik@janelia.hhmi.org(KP)
Abstract
Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular
scale, unlocking the study of subthreshold activity, synchrony, and network dynamics
with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s)
and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we
present VolPy, an automated and scalable pipeline to pre-process voltage imaging
datasets. VolPy features motion correction, memory mapping, automated segmentation,
denoising and spike extraction, all built on a highly parallelizable, modular, and
extensible framework optimized for memory and speed. To aid automated segmentation,
we introduce a corpus of 24 manually annotated segmentation datasets from different
preparations and voltage indicators. We benchmark VolPy against ground truth
segmentation and electrophysiology recordings, demonstrating VolPy’s performance in
neuron localization, spike extraction, and scalability.
Author summary
Roughly 290 million electrical action potentials occur every second in the human brain,
facilitating the propagation of signals among cells in the nervous system and driving
most of our daily operations. New methods in brain imaging are emerging that have the
speed and resolution to capture events in the brain at the pace at which neurons
typically communicate. These methods measure voltage in neurons by using light, and
therefore can access very detailed brain signaling patterns. However, the adoption of
these methods by a larger community, and not a restricted set of experts, is limited by
the lack of computational tools, thereby greatly hindering progress in this field. In this
paper, we present a set of tools, which we call VolPy, that greatly facilitate the
preprocessing of this new imaging datasets. This pipeline incorporates very efficient and
optimized algorithms that can identify neurons and extract their activity with great
accuracy. The presented software will make this new imaging modality accessible to a
wide audience.
April 22, 2020 1/15

Introduction 1
While methods have been developed to process voltage imaging data at mesoscopic scale 2
and multi-unit resolution [1–3], to date there is no established pipeline for large-scale 3
cellular-resolution analyses. Voltage indicators have only recently become sensitive 4
enough to generate large-scale recordings at cellular resolution [4–10]. Approaches 5
developed for calcium imaging fail with this valuable but different new imaging 6
modality: dataset sizes have increased one or two orders of magnitude (Tens of GBs vs 7
TBs per hour), and assumptions about underlying signals made by calcium imaging 8
analysis methods can be violated. For instance, established non-negative matrix 9
factorization (NMF) methods [11, 12] are not efficient for voltage imaging [13]: (i) 10
segmentation approaches developed for somatic calcium imaging fail for other 11
modalities, (ii) voltage imaging signals contribute little variance and are not well 12
isolated by existing NMF approaches; (iii) voltage signals display both positive and 13
negative fluctuations and significant multiplicative noise may arise from light absorption 14
in one-photon voltage recordings, which may make the NMF framework, which is based 15
on non-negativity in both spatial and temporal domains, not readily apply. To address 16
these challenges, efficient algorithms must be developed and validated for multiple 17
aspects of voltage imaging data processing, including cell segmentation, denoising, and 18
spike detection. Ideally, these methods should form a user- and developer- friendly, 19
modular, and computationally efficient pipeline, to facilitate optimization of algorithms 20
and future development. 21
Algorithms specific for processing voltage imaging datasets are beginning to 22
populate the literature. For instance, [5] introduces iterative methods for spike 23
detection, which detect lower-amplitude spikes using filters generated from 24
easily-detected high-amplitude spikes (SpikePursuit). However, this method requires 25
manual or semi-manual selection of neurons. Second, the algorithm is not optimized for 26
parallel processing. Finally, it is not embedded into a reusable and well documented 27
data format, an important requirement for broad community use [14]. A more 28
standardized approach is provided by [6, 13, 15]. These methods employ penalized 29
matrix decomposition to denoise and demix signals, while relying on localNMF to 30
perform an initial segmentation. However, these frameworks do not embed an adaptive 31
and automated mechanism for spike extraction and are not integrated into a robust, 32
scalable and multi-platform framework. Further, while these approaches have been 33
validated with simulated data and small electrophysiological ground truth datasets, the 34
lack of both electrophysiological and segmentation ground truth has hindered the 35
validation and comparison of all these approaches. In summary, no validated, complete, 36
scalable and automatic analysis pipeline for voltage imaging data analysis exists to date. 37
To address these shortcomings, we established objective performance evaluation 38
benchmarks and a new analysis pipeline for pre-processing voltage imaging data, which 39
we named VolPy. First, in order to establish a common validation framework and 40
automate neuron segmentation, we created a corpus of annotated datasets with 41
manually segmented neurons. Second, we trained a supervised algorithm to 42
automatically localize and segment cells via convolutional networks [16]. Third, we 43
developed an improved algorithm to denoise fluorescence traces and extract spikes, 44
which built upon SpikePursuit [5]. Fourth, we quantitatively evaluated performance of 45
VolPy on neuron segmentation, spike extraction and scalability. Fifth, we integrated our 46
methods within the CaImAn ecosystem [12], a popular suite of tools for single cell 47
resolution brain imaging analysis. In addition to computational advantages provided by 48
CaImAn’s optimized tools, this enabled immediate adoption by the research labs 49
already relying on the CaImAn ecosystem. 50
April 22, 2020 2/15

Materials and methods 51
Creation of a corpus of annotated datasets 52
To date there are no established metrics to compare the practical performance of 53
algorithms for single cell localization and/or segmentation in cellular-resolution voltage 54
imaging datasets. Toward establishing such a benchmark, and with the goal of 55
developing new supervised algorithms, we generated a corpus of annotated datasets 56
(Ground truth, GT) in which neurons were manually segmented. GT was constructed 57
by human labelers from two summary images (mean and local correlation images, 58
Fig 1b,c) and a pre-processed movie that highlights active neurons (local correlation 59
movie, S1 Vid). More specifically, after motion correction, we generated a mean image, 60
a correlation image and a correlation video as follows: 61
a
MOTION MEMORY TRACE DENOISING &
MOVIE SEGMENTATION
CORRECTION MAPPING SPIKE EXTRACTION
b c d
e f 0
0 0
1 1
2 2
3 3
4 4
5 5 10
6 78 6 78
9 9
10 11 10 11
12 12
13 14 13 14
16 15 16 15
181917 181917
21 20 21 20
22 23 22 23 20
60um 60um
1au
0 0.3 1s 400ms
Fig 1. Analysis pipeline for voltage imaging data. (a) Four pre-processing steps
for segmenting neurons and extracting spikes from voltage imaging movies. (b)
Correlation image (front) and mean image (back) as the input of the segmentation step.
(c) The segmentation step outputs class probabilities, bounding boxes and contours.
The results are overlaid to the correlation image in (b). (d) Results of trace denoising
and spike extraction. The gray dashed horizontal line represents the inferred spike
threshold. (e) Correlation image (left) and mean image (right) of one mouse Layer1
neocortex dataset with contours detected by VolPy. (f) Temporal traces with detected
spikes corresponding to neurons in panel (e) extracted byVolPy (left). The dashed gray
portion of the traces is magnified on the right.
Mean image: To compute the mean image, we average the movie across time for each 62
April 22, 2020 3/15

pixel and normalize by the pixel-wise z-score. 63
Correlation image: The correlation image is a variation of that implemented in [17]. 64
After removing the baseline of the movie by low-pass filtering, we average the temporal 65
correlation of each pixel with its eight neighbor pixels. We also normalize the 66
correlation image by z-scoring when fed to the neural network. 67
Correlation movie: We introduced a novel type of representation method, the 68
correlation movie, which was essentially a running version of the correlation image 69
computed over overlapping chunks of video frames. This new type of visualization 70
significantly improved the visibility of spikes in voltage imaging movies (see S1 Vid). 71
Guided by these three visual cues, two annotators marked the contours of neurons 72
using the ImageJ Cell Magic Wand tool plugin [18](see S1 Fig). For neurons to be 73
included in the corpus, both annotators had to agree on the selection, which fulfilled the 74
following criteria: (i) Neurons were very clear on at least one of the three cues; (ii) 75
Neurons were moderately clear in one of the summary images and exhibited a spatial 76
footprint in selected frames of the local correlations movie. Summary information about 77
the annotated datasets is reported in Tab 1. 78
Table 1. Properties of three heterogeneous types of datasets. For each type of

dataset the name, organism, brain region, source, imaging rate, voltage indicator, and
the total number of neurons selected by the manual annotators are given.
Name Organism Brain region Source Rate (Hz) Indicator # neurons
L1 Mouse L1 cortex [5] 400 Voltron 523
TEG Zebrafish Tegmental [5] 300 Voltron 107
HPC Mouse Hippocampus [6] 1000 paQuasAr3-s 41
A novel analysis pipeline for voltage imaging 79
We propose a novel scalable pipeline for automated analysis that performs the 80
preprocessing steps required to extract spikes and sub-threshold activity from voltage 81
imaging movies (Fig 1). First, input data are processed to remove motion artifacts with 82
parallelized algorithms, and saved into a memory mapping file format that enables 83
efficient concurrent access. In a second stage, VolPy segments candidate neurons using 84
supervised algorithms (Fig 1a,c) or a manual annotation tool. Finally, VolPy denoises 85
fluorescence traces, infers spatial footprints, detects single spikes, and extracts 86
subthreshold activity of each neuron (Fig 1a,d). In Fig 1e,f, we report the result of 87
running the full pipeline on an example of mouse L1 neocortex voltage imaging dataset. 88
S2 Vid shows the reconstructed movie based on the result of VolPy on the same mouse 89
L1 neocortex dataset. In what follows we present each stage of the VolPy pipeline in 90
detail. 91
Motion correction and memory mapping 92
First, movies are corrected for sample movement and stored in an efficient format by 93
relying on the infrastructure provided by CaImAn [12, 19]. CaImAn provides routines to 94
simultaneously register frames to a template and creates memory-mapped files. Memory 95
mapping provides the ability to quickly read arbitrary portions of the movie in any 96
direction without loading the full movie into memory. In turn, this allows parallelization 97
of all the operations which are required to generate summary images and denoise the 98
signal. 99
April 22, 2020 4/15

Segmentation 100
The low SNR of voltage imaging data hinders the applicability of the segmentation 101
methods previously devised for calcium imaging data [11]. Here we propose to segment 102
neurons with supervised learning approaches. While previous attempts at cell 103
localization and segmentation have extended U-Net fully convolutional network 104
architectures [20], in our hands this family of methods fail when facing datasets in 105
which neurons overlap heavily (TEG datasets). So instead we approach the problem 106
with Mask R-CNN, a convolutional network for object localization and 107
segmentation [16]. Mask R-CNN is a particularly promising architecture as it enables 108
separation of overlapping objects in a specific area by providing each object with a 109
unique bounding box (Fig 2a). More in details, Mask R-CNN provides simultaneous 110
object localization and instance segmentation via a combination of two network 111
portions: backbone and head. The backbone features a pre-trained convolutional 112
network (such as VGG, ResNet, Inception or others [21]) for feature extraction. Mask 113
R-CNN also exploits another effective backbone: feature pyramid networks [22], a 114
top-down architecture with lateral connections, which enables the network to extract 115
features on multiple scales from the feature maps. In the head, based on the extracted 116
features, a Region Proposal Network proposes initial bounding boxes for each candidate 117
object, which are fed to two downstream branches. One branch is trained to predict a 118
class label and a bounding box offset which refines the initial bounding box, while the 119
other branch outputs a binary mask for each candidate object. An example of the 120
network inference on a validation dataset by VolPy is shown in Fig 1e. 121
We adapted Mask R-CNN to our purpose as follows. We chose a combination of 122
ResNet-50 pre-trained on the COCO dataset and feature pyramid networks as the 123
backbone. The input of the network is a three channel image: two for the mean images 124
and one for the correlation image. Three channels are required to match the input to the 125
first few layers pre-trained on the COCO dataset. The network was trained to predict 126
only one class, neuron or not neuron (background) instead of a multi-label output. 127
During training, we randomly cropped the input image into 128x128 crops and 128
applied the following data augmentation techniques using the imgaug [23] package: flip, 129
rotation, multiply (adjust brightness), Gaussian noise, shear, scale and translation. 130
Each mini-batch contained six cropped images. We trained on one GPU the head (the 131
whole network except the ResNet) of the network for 2000 iterations with learning rate 132
0.01 and then trained layers after the first three stages of the ResNet (28 layers) for 133
another 2000 iterations with learning rate 0.001. We used stochastic gradient descent as 134
our optimizer with a constant learning momentum 0.9. The weight decay was 0.0001. 135
For validation, images were padded with zeroes to make width and height multiples 136
of 64 so that feature maps could be smoothly scaled for the Feature Pyramid Network. 137
We only selected neurons with confidence level greater or equal to 0.7. 138
While VolPy segmentation method achieves good performance on similarly collected 139
datasets, we do not expect it to generalize to completely new datasets out of the box. 140
To overcome this issue, we developed a manual annotation visual interface tool within 141
VolPy. This enabled one-click semiautomatic neuron segmentation based on the Python 142
cell magic wand tool [24]. See S3 Vid for more detail. 143
Trace denoising and spike extraction 144
Classical algorithms for denoising calcium imaging movies and extracting spikes from 145
the corresponding fluorescence traces fail when applied to voltage imaging movies. On 146
the one hand, the low signal-to-noise ratio and the complex background fluorescence 147
require new methods for refining spatial footprints, and on the other hand, substantially 148
different biophysical models underlie the temporal dynamics of the fluorescence 149
April 22, 2020 5/15

a class box
6
conv 77 0 .90
2.9
0. 0.9 0 9
. 90.50 0 .8
00
77 83 91
0. .92 0. .4.1730.
00 5
71 0 .8 .88
0. 0 0
feature 69 0
.9 0
0.
extraction ROI align 83
conv 0.
6 0
0.
mask
0.86
1 3 remove background 4 5
extract movie detect prominent compute
by SVD and
in context region spikes by thresholding spike template
Ridge regression
2 7 6 8 extract
optimize spatial filter apply whitened matched
high-pass filter sub-threshold
by Ridge regression filter, detect spikes again
activity
High
0.5s Low
c neuron #1 neuron #2 neuron #3 t

ts
trec
tsub
spikes
0.5s
Fig 2. Segmentation, trace denoising and spike extraction framework. (a)

Mask R-CNN framework for neuron segmentation. The network predicts a class label, a
bounding box and a binary mask for each candidate neuron taking summary images as
inputs (mean and correlation). (b) Algorithm for fluorescence trace denoising and spike
extraction. ○-1 ○ 2 Load and high-pass filter the signal in one context region of the
movie. The initial temporal trace is computed either by averaging ROI pixels or by
applying the spatial filter (if available) to the context region. Two steps are executed in
loop for three iterations. The former (○- 3 ○)6 estimates spike time, and the latter (○) 7
refines the spatial filter. ○
3 Extract the first 8 principal components of the background
pixels using SVD and then remove the background contamination via Ridge regression.
○4 The background-removed trace (t) is high-pass filtered to obtain a zero baseline
trace ts for further processing. Spikes with peaks larger than 3.5 times noise level (gray
dotted line) are selected. ○ 5 Waveforms of these spikes (gray) are averaged to obtain a
spike template (black line). ○ 6 A whitened matched filter is used to enhance spikes akin
to the template. For a second time, peaks are detected by thresholding ts based on the
noise level. The reconstructed signal (trec ) is obtained by convolving the template
waveform and the inferred spike train. ○ 7 Refine spatial filter through Ridge regression.
The product of the context region across time with the refined spatial filter generates
the temporal trace for the following iteration. ○ 8 After three iterations, the
subthreshold activity(tsub ) is extracted by applying a low-pass filter on the residual
trace (t − trec ). (c) Examples of t, trec , ts , and tsub traces, along with detected spikes.
April 22, 2020 6/15

associated to spikes. To solve both problems, we built upon and extended the 150
SpikePursuit (SP) algorithm [5]. We improved SP in the following directions: (i) While 151
the original version of the algorithm required manual selection of candidate neurons, 152
VolPy automatically initializes it using the output of the trained Mask R-CNN; (ii) A 153
minimal amount of data were loaded in memory thanks to the memory mapping 154
infrastructure, thereby reducing memory requirements; (iii) We increased the reliability 155
of the algorithm, by introducing a more robust estimate of the background, spikes and 156
subthreshold activity; (iv) We scaled up the performance by improving the underlying 157
optimization algorithms. In what follows, we introduce SP (see Fig 2b), along with our 158
modifications to improve the algorithm. Psuedocode of the algorithm is shown in S1 159
Text. 160
ROI loading and preprocessing: As a result of segmentation, each candidate 161
neuron has an associated binary mask which represents its spatial extent (ROI region 162
R). The ROI is dilated to get a larger region centered on the neuron (context region C). 163
Background pixels are defined as all the pixels in the context region at least nB pixels 164
(12 pixels by default) away from the ROI region (background region). As a first step, all 165
pixels in the context region are efficiently retrieved (Y ) from the memory mapping file 166
and high-pass filtered (Yh ) to compensate for photo-bleaching (○ 1○

2 in Fig 2b, see 167
psuedocode in S1 Text for more detail). The initial temporal trace t associated to the 168
neuron can be approximated either as the mean signal of the ROI region pixels, or -- if 169
a spatial filter w previously calculated were available -- as the weighted average across 170
all pixels in the context region : 171
( 1 P
n(r) Yh [:, x] if w is not given
t= x∈r (1)
Yh w if w is given
where r represents pixels in ROI region R, n(r) represents the total number of pixels in 172
the ROI region. 173
Afterwards, two steps are executed in loop for three rounds. The former estimates 174
spike times, and the latter refines the spatial filters. 175
Spike time estimation: In order to estimate spike times from the fluorescence traces 176
(○
3○ 4○ 5○6 in Fig 2b) we first compute the singular value decomposition of the 177
background pixels Yb : 178
Yb = U ΣV (2)
where U contains the temporal components. This is then used to remove background 179
contamination via Ridge regression, in which Ub , the first npc components (default is 8) 180
of U is the regressor and the temporal trace is the predictor (○ 3 in Fig 2b): 181
β = (UbT Ub + λb kUb k2F I)−1 UbT t (3)
t = t − Ub β (4)
The idea behind equation 3 is to isolate from the trace the portion of signal which is 182
background only. We added an L2 regularizer to penalize large regression coefficients 183
caused by high signal-to-noise ratio neurons in the background pixels. This provided 184
more reliable results compared to the original linear regression method. 185
The background-removed trace t is then filtered with a high-pass filter (1 Hz cut-off) 186
to obtain ts which is further processed for spike extraction (○ 4 in Fig 2b). A gaussian 187
low-pass filter is then applied to ts to prevent spikes associated with multiple noisy 188
peaks. Two rounds of spike detection are subsequently performed. The first round 189
selects high SNR spikes. To increase robustness of spike detection when few spikes 190
April 22, 2020 7/15

appear in the signal, we replaced the adaptive thresholding method in SP with a 191
thresholding method selecting peaks larger than 3.5 times the noise level. The noise 192
level is estimated by computing standard deviation of the trace, but only considering 193
value lower than its median(see psuedocode in S1 Text for more detail). A spike 194
template z is computed by averaging all the peak waveforms: 195
ns
1 X
z= ts [s[i] − τ : s[i] + τ ] (5)
ns i=1
where s is the list of spike time, ns is the total number of spikes, τ is the half size of 196
window length (○ 5 in Fig 2b). Subsequently, a whitened matched filter [25] is used to 197
enhance spikes with shape similar to a template (○ 6 in Fig 2b). More in details, we use 198
the Welch method to approximate the spectral density of the noise in the fluorescence 199
signal. Second, we scale the signal in the frequency domain to whiten the noise. Finally, 200
we convolve with a time-flipped template, which is approximated using the 201
peak-triggered average. 202
The latter round of spike detection selects spikes larger than 4 (adjustable 203
parameter) times the new noise level. Then, a reconstructed and denoised trace is 204
computed by convolving the inferred spike train (q) with the waveform template: 205

1 if there is a spike at time t
trec = q ∗ z where q[t] = (6)
0 otherwise
Spatial filter refinement: The second step is to refine the spatial filter (○
7 in 206
Fig 2b). The updated spatial filter is computed by Ridge regression, where the 207
reconstructed and denoised trace trec is used to approximate the high-passed movie: 208
w = (YhT Yh + λw kYh k2F I)−1 YhT trec (7)

The ridge regression problem is solved with an iterative and efficient algorithm [26] 209
implemented in the Scikit-Learn package (’lsqr’). Subsequently, the weighted average of 210
the movie with the refined spatial filter is used as the updated temporal trace for the 211
following iteration: 212
t = Yh w (8)
After three iterations of spike time estimation and spatial filter refinement, 213
subthreshold activity is extracted(○ 8 in Fig 2b). This is achieved by applying a low-pass 214
filter (75Hz cutoff by default) on the residual trace obtained by subtracting the 215
reconstructed fluorescence trace from t. 216
Evaluation of VolPy 217
Precision/Recall Framework to measure segmentation performance 218
In order to measure the performance of VolPy segmentation, we compared the spatial 219
footprints extracted by VolPy with our manual annotations (see [12] component 220
registration for a detailed explanation). In summary, we computed the Jaccard distance 221
(the inverse of intersection over union) to quantify similarity among ROIs, and then 222
solved a linear assignment problem with the Hungarian algorithm to determine matches 223
and mismatches. Once those were identified, we adopted a precision/recall framework 224
and defined True Positive (TP), False Positive (FP), False Negative (FN), and True 225
Negative (TN) as follows: 226
April 22, 2020 8/15

TP = number of matched spatial footprints
FP = number of spatial footprints in VolPy but not in GT
(9)
FN = number of spatial footprints in GT but not in VolPy
TN = 0
Next we computed precision, recall and F1 score of the performance in matching as 227
the following: 228
Precision = TP/(TP + FP)

Recall = TP/(TP + FN) (10)
F1 = 2 × Precision × Recall/(Precision + Recall)
The F1 score is a number between zero and one. Better performance results in higher F1 229
score. 230
Cross-Validation to evaluate the segmentation performance on limited 231
datasets 232
In order to decrease the selection bias originated from the separation in training and 233
validation datasets and better evaluate Mask R-CNN model on our limited datasets (24 234
in total), we performed a stratified three-fold cross-validation. The reason we used a 235
stratified three-fold cross-validation rather than a normal three-fold cross-validation is 236
that we wanted to maintain an equivalent portion of training and evaluation samples for 237
each dataset type. In more details, we partitioned our datasets into three folds so that 238
L1, TEG, and HPC would all maintain the same number of samples without repetition 239
(Tab 2 train/val column shows one group of the partition). During cross-validation two 240
groups were used as training sets while the remaining one as validation set. The 241
cross-validation process was repeated three times with each group used exactly once as 242
validation set. 243
Table 2. All annotated datasets for segmentation of VolPy. For each dataset
the name, size of datasets and number of labeled neurons are given.
Name Size # Name Size #
L1.00.00 20000*512*128 84 HPC.29.04 20000*164*96 2
L1.01.00 20000*512*128 53 HPC.29.06 20000*228*96 2
L1.01.35 20000*512*128 69 HPC.32.01 20000*256*96 4
L1.02.00 20000*512*128 61 HPC.38.05 20000*176*92 4
L1.02.80 20000*512*128 43 HPC.38.03 20000*128*88 2
L1.03.00 20000*512*128 79 HPC.39.07 20000*264*96 5
L1.03.35 20000*512*128 57 HPC.39.03 20000*276*96 5
L1.04.00 20000*512*128 43 HPC.39.04 20000*336*96 4
L1.04.50 20000*512*128 34 HPC.48.01 20000*224*96 2
TEG.01.02 10000*364*320 33 HPC.48.05 20000*212*96 4
TEG.02.01 10000*360*256 29 HPC.48.07 20000*280*96 2
TEG.03.01 10000*508*288 45 HPC.48.08 20000*284*96 3
For each run of the cross-validation process, we trained a single network and tested 244
it on both training and validation sets. We then computed the mean and standard 245
deviation of the F1 score for different types of datasets with training and validation sets 246
treated separately. 247
April 22, 2020 9/15

Spike matching performance 248
In order to match spikes extracted from simultaneous voltage imaging and 249
electrophysiology datasets, we performed the basic operations (spike deletion, spike 250
insertion, spike shift) transforming one spike train into the other, that are used to 251
calculate the Victor-Purpura distance [27]. We used a cost M = 25 for adding/removing 252
a spike and cost 1 of shifting a spike per unit of time. We minimized the total cost using 253
a Hungarian algorithm. Spikes that ended up shifted to align with each other were 254
considered a match. After identifying matches and mismatches, we proceeded similarly 255
to what explained above. We defined TP, FP, FN, TN similar to Equation 9: 256
TP = number of matched spikes

FP = number of spikes in VolPy but not in GT
(11)
FN = number of spikes in GT but not in VolPy
TN = 0
Then we calculated the F1 score same as Equation 10. 257
Results 258
In what follows we report a systematic evaluation of VolPy against ground truth in 259
terms of performance in identifying neurons, spike extraction and scalability. 260
VolPy localizes neurons using a moderate amount of training 261
data 262
We trained a modified version of the Mask R-CNN network architecture on three 263
heterogeneous types of datasets (Tab 1) and evaluated its performance using 3-fold cross 264
validation (see Tab 2 and Materials and Methods for details). In Fig 3a, we compared 265
the contours predicted by VolPy with manual annotations on three example datasets: 266
VolPy was able to identify candidate neurons even in conditions of low signal-to-noise 267
and spatial overlap. We quantified VolPy performance in detecting neurons using a 268
precision/recall framework, which accounted for the amount of overlap between 269
predicted and ground truth neurons when assigning matches and mismatches. In Fig 3b 270
and Tab 3 we summarized the F1 score for all the probed datasets. The results indicate 271
that our segmentation approach performs well provided sufficient neurons are fed to 272
train the algorithm. Indeed, VolPy obtained F1 scores of 0.89 ± 0.01 on the L1 dataset 273
(532 neurons in total), 0.71 ± 0.02 on the TEG datasets (107 neurons), and 0.46 ± 0.07 274
on the HPC dataset (39 neurons). In case of TEG, the performance of VolPy is fair 275
considering that the network was trained with only two datasets of this type. In the 276
HPC datasets however the performance on both training and test sets is relatively 277
inferior. We hypothesize that this is due to the fact that not enough data are available 278
(see #neurons column in Tab 1), possibly combined with the low signal to noise typical 279
of this dataset type. 280
VolPy detects with fidelity single spikes from voltage imaging 281
data 282
We validated the VolPyspike extraction algorithm on three voltage imaging datasets in 283
which electrophysiology was simultaneously recorded with voltage imaging. We 284
automatically analyzed voltage imaging data recordings from mouse L1 neocortex and 285
Zebrafish Tegmental area [5] with the VolPy pipeline. The outputs were the spatial 286
footprints, the voltage traces, and the corresponding spike time(Fig 3e-g). Spikes for 287
April 22, 2020 10/15

a L1 HPC TEG b 1.0
L1
TEG
GT HPC
0.8
MRCNN
F1 score
0.6
0.4
matches
0.2
0.0
datasets
c train
val
0.8
F1 score
0.6
0.4
0.2
0.0
L1 TEG HPC
mismatches
d 1.0
F1 score
0.9
0.8
0.7 Fish1
Fish2
0.6 Mouse
datasets
e Fish1 f Fish2 g Mouse
300Hz 300Hz 500Hz

Ephys
VolPy
1s 10s 1s
h processing time allocation i parallelization speed j peak memory
(FOV size 512*128) (10GB, 75 neurons) (10GB, 75 neurons)
1500 25
500 motion corection
mem mapping peak memory (GB)
segmentation 1250
400 20
time (s)
spike extraction
time (s)
1000
300 15
750
200
500 10
100 250
0 5
0
1 2 4 1 2 4 8 1 2 4 8
frames (104) number of processors number of processors
Fig 3. Evaluation of VolPy performance. (a) Evaluation of segmentation against
three manually annotated datasets including mouse sensory cortex, mouse hippocampus,
and larval zebrafish. Matched and mismatched neurons between VolPy and ground
truth are shown in upper and bottom panels respectively. (b) F1 scores of VolPy for all
evaluated datasets. (c) Average F1 score on training and validation sets grouped by
dataset type. (d)-(g) Evaluation of VolPy spike extraction performance against
simultaneous electrophysiology. For each dataset in (e),(f) and (g), electrophysiology
(top, blue) and fluorescence signal denoised by VolPy (bottom, orange) are reported.
Spikes are matched between the two groups with F1 score computed in (d). (h)-(j)
Evaluation of time and memory performance of VolPy. Processing time of VolPy with
different number of frames and processors are shown in (h) and (i). Peak memory usage
of VolPy with different number of processors is shown in (j).
April 22, 2020 11/15

Table 3. VolPy performance on segmentation. For each type of datasets,
number of datasets, number of neurons, recall, precision, F1 score for training and
validation computed by stratified cross-validation are provided.
Name #datasets #neurons recall(%) precision(%) F1 (%)
train/val train/val train/val train/val train/val
L1 6/3 349 ± 7/174 ± 7 86 ± 4/85 ± 3 94 ± 2/95 ± 1 90 ± 2/89 ± 1
TEG 2/1 71 ± 7/36 ± 7 70 ± 3/67 ± 2 83 ± 7/77 ± 3 76 ± 4/71 ± 2
HPC 8/4 26 ± 1/13 ± 1 88 ± 11/66 ± 7 55 ± 7/40 ± 12 65 ± 8/46 ± 7
electrophysiology recordings were obtained by thresholding. Spikes were matched 288
against ground truth by solving a linear sum assignment problem using the Hungarian 289
algorithm. The F1 score of each dataset (Fig 3d) was computed relying on a 290
precision/recall framework based on matched and unmatched spikes. We observed that 291
VolPy performed well on all datasets: the F1 score across three datasets was 0.95 ± 0.03 292
and the F1 score for each dataset was above 0.9. This confirms that single spikes from 293
voltage imaging data can be automatically extracted with fidelity. 294
VolPy enables the analysis of large voltage imaging datasets on 295
small and medium sized machines 296
We examined the performance of VolPy in terms of processing time and peak memory 297
for the datasets presented above. We ran our tests on a linux-based desktop (Ubuntu 298
18.04) with 16 CPUs (Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz) and 64 GB of 299
RAM. For segmentation, we used a GeForce RTX 2080 Ti GPU with 11 GB of RAM 300
memory. 301
Fig 3h reports the VolPy processing time in function of the number of frames. The 302
results show that the processing time scales linearly in the number of frames. 303
Processing 75 candidate neurons in a 1.6 minutes long movie (512*128 pixel FOV) takes 304
about 8 minutes. Spike extraction (red bar) accounts for most of the processing time. 305
In order to probe the benefits of parallelization, we ran VolPy 4 times on the same 306
hardware while limiting the runs to 1, 2, 4 and 8 CPUs respectively (Fig 3i). We 307
observed significant performance gains due to parallelization, especially in the motion 308
correction and SpikePursuit phase, with a maximum speed-up of 3X. Simultaneously, we 309
recorded the peak memory usage of VolPy while running on a different number of CPUs 310
for each run. Fig 3j shows how the peak memory increases with the number of threads. 311
Therefore, VolPy enables speed gains by trading-off execution time for memory usage. 312
Discussion 313
Recording subthreshold voltage changes in populations of neurons is necessary to dissect 314
the details of information processing in the brain. Voltage imaging is currently the only 315
technique that promises to achieve this goal with high spatio-temporal resolution. Here 316
we introduced VolPy, the first end-to-end, automatic, scalable, and open source analysis 317
pipeline for large scale voltage imaging datasets. Compared to previous approaches, 318
VolPy advances the state-of-the-art in the following ways. First, VolPy automatically 319
corrects movies for motion artifacts and generates memory-efficient data files. Second, it 320
detects both active and inactive neurons in the FOV via a supervised algorithm. Third, 321
it provides trace denoising and adaptive spike time estimation based on automatic, 322
robust and scalable peak detection algorithms. Finally, VolPy is embedded into the 323
April 22, 2020 12/15

popular open-source software CaImAn, and therefore directly available to the existing 324
community. 325
To evaluate generalization performance of the segmentation algorithm, we trained a 326
single neural network on three dataset types simultaneously. When provided with 327
sufficient training data, we observed good generalization performance across the trained 328
dataset types. However, segmentation of new dataset types will likely require additional 329
training data to achieve similar performance. For this reason, we provide an interface 330
within VolPy to manually label neurons and produce new annotations. In conjunction, 331
we provide a step-by-step guide to train Mask R-CNN neural networks with newly 332
labelled datasets 333
(https://github.com/flatironinstitute/CaImAn/wiki/Training-Mask-R-CNN). As more 334
data will become available and more users will adopt VolPy, we plan to develop a 335
web-based graphical user interface for experimentalists to manually label datasets and 336
transfer the resulting annotations to a distributed computing server, which will 337
periodically retrain the network and improve the performance of our system. Currently, 338
only a small number of labs have publicly shared cellular-resolution large-scale voltage 339
recordings, and we have collected most of them in this corpus. However, we anticipate 340
that these methods will rapidly become more widely adopted, resulting in a much wider 341
variety of dataset types. 342
In the future, we plan to extend this framework in two algorithmic directions. First, 343
similar to calcium imaging [28], we plan to develop methods appropriate to real-time 344
scenarios, where activity of neurons needs to be inferred on-the-fly and frame-by-frame; 345
Second, we plan to include algorithms appropriate to quantify supra- and subthreshold 346
signals in neuronal sub-compartments. 347
Availability 348
The code and datasets are available at https://github.com/flatironinstitute/CaImAn 349
and https://www.dropbox.com/sh/5pu8g0vz5hl0mpq/AABHU3RC81hKnWAqu- 350
e26xP8a?dl=0 (link will be available on Zenodo upon 351
publication). 352
Supporting information 353
S1 Vid. Example of voltage imaging data on mouse neocortex data. Left: 354
movie after motion correction. Right: local correlation movie. 355
S2 Vid. Example of reconstructed movie in VolPy.Left: movie after motion 356
correction. Middle: movie removed from baseline. Right: reconstructed movie. 357
S3 Vid. Example of manual annotation interface in VolPy. This is used to 358
annotate datasets quickly when there are few neurons in FOV. 359
S1 Fig. Manual annotations of voltage imaging datasets with ImageJ. We 360
selected neurons based on mean image (left), correlation image (mid-left) and local 361
correlation movie (mid-right). Two annotators marked the contours of neurons using 362
ImageJ Cell Magic Wand tool plugin and saved selections in ROI manager (right). 363
S1 Text. Algorithmic details for trace denoising and spike extraction in 364
VolPy. 365
April 22, 2020 13/15

Acknowledgments 366
We thank K Svoboda, A Singh, M Ahrens, T Kawashima, Y Shuai, A Cohen, M Xie for

providing voltage imaging datasets. We thank H Eybposh for useful discussions. We
thank A Singh for the initial version of Python code, and for insightful comments and
discussions.
References
1. Marshall JD, Li JZ, Zhang Y, Gong Y, St-Pierre F, Lin MZ, et al. Cell-type
specific optical recording of membrane voltage dynamics in freely moving mice.
Cell. 2016;167(6):1650–1662.e15. doi:10.1016/j.cell.2016.11.021.
2. Carandini M, Shimaoka D, Rossi LF, Sato TK, Benucci A, Knöpfel T. Imaging
the Awake Visual Cortex with a Genetically Encoded Voltage Indicator. Journal
of Neuroscience. 2015;35(1):53–63. doi:10.1523/JNEUROSCI.0594-14.2015.
3. Akemann W, Mutoh H, Perron A, Park YK, Iwamoto Y, Knöpfel T. Imaging

neural circuit dynamics with a voltage-sensitive fluorescent protein. Journal of
Neurophysiology. 2012;108(8):2323–2337. doi:10.1152/jn.00452.2012.
4. Knöpfel T, Song C. Optical voltage imaging in neurons: moving from technology
development to practical tool. Nature Reviews Neuroscience. 2019; p. 1–9.
5. Abdelfattah AS, Kawashima T, Singh A, Novak O, Liu H, Shuai Y, et al. Bright

and photostable chemigenetic indicators for extended in vivo voltage imaging.
Science. 2019;365(6454):699–704. doi:10.1126/science.aav6416.
6. Adam Y, Kim JJ, Lou S, Zhao Y, Xie ME, Brinks D, et al. Voltage imaging and
optogenetics reveal behaviour-dependent changes in hippocampal dynamics.
Nature. 2019;569(7756):413.
7. Kannan M, Vasan G, Huang C, Haziza S, Li JZ, Inan H, et al. Fast, in vivo
voltage imaging using a red fluorescent indicator. Nature methods.
2018;15(12):1108.
8. Piatkevich KD, Bensussen S, Tseng Ha, Shroff SN, Lopez-Huerta VG, Park D,
et al. Population imaging of neural activity in awake behaving mice in multiple
brain regions. bioRxiv. 2019; p. 616094.
9. Piatkevich KD, Jung EE, Straub C, Linghu C, Park D, Suk HJ, et al. A robotic
multidimensional directed evolution approach applied to fluorescent voltage
reporters. Nature chemical biology. 2018;14(4):352.
10. Roome CJ, Kuhn B. Simultaneous dendritic voltage and calcium imaging and
somatic recording from Purkinje neurons in awake mice. Nature communications.
2018;9(1):3388.
11. Pnevmatikakis E, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, et al.
Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data.
Neuron. 2016;89(2):285–299. doi:10.1016/j.neuron.2015.11.037.
12. Giovannucci A, Friedrich J, Gunn P, Kalfon J, Brown BL, Koay SA, et al.
CaImAn an open source tool for scalable calcium imaging data analysis. eLife.
2019;8:e38173. doi:10.7554/eLife.38173.
April 22, 2020 14/15

13. Buchanan EK, Kinsella I, Zhou D, Zhu R, Zhou P, Gerhard F, et al. Penalized
matrix decomposition for denoising, compression, and improved demixing of
functional imaging data. arXiv:180706203 [q-bio, stat]. 2018;.
14. Teeters JL, Godfrey K, Young R, Dang C, Friedsam C, Wark B, et al. Neurodata
Without Borders: Creating a Common Data Format for Neurophysiology.
Neuron. 2015;88(4):629–634. doi:10.1016/j.neuron.2015.10.025.
15. Xie M, Adam Y, Fan L, Boehm UL, Kinsella IA, Zhou D, et al. High fidelity
estimates of spikes and subthreshold waveforms from 1-photon voltage imaging in
vivo. bioRxiv. 2020;.
16. He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the
IEEE international conference on computer vision; 2017. p. 2961–2969.
17. Smith SL, Häusser M. Parallel processing of visual space by neighboring neurons
in mouse visual cortex. Nature Neuroscience. 2010;13(9):1144–1149.
doi:10.1038/nn.2620.
18. Walker T. Cell magic wand tool. Cell Magic Wand Tool; 2014.
19. Pnevmatikakis EA, Giovannucci A. NoRMCorre: An online algorithm for
piecewise rigid motion correction of calcium imaging data. Journal of
Neuroscience Methods. 2017;291:83–94. doi:10.1016/j.jneumeth.2017.07.031.
20. Falk T, Mai D, Bensch R, Cicek O, Abdulkadir A, Marrakchi Y, et al. U-Net:
deep learning for cell counting, detection, and morphometry. Nature Methods.
2019;16(1):67–70. doi:10.1038/s41592-018-0261-2.
21. Agarwal T, Mittal H. Performance Comparison of Deep Neural Networks on
Image Datasets. In: 2019 Twelfth International Conference on Contemporary
Computing (IC3). IEEE; 2019. p. 1–6.
22. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid
networks for object detection. In: Proceedings of the IEEE conference on
computer vision and pattern recognition; 2017. p. 2117–2125.
23. Jung AB, Wada K, Crall J, Tanaka S, Graving J, Yadav S, et al.. imgaug; 2019.
Available from: https://github.com/aleju/imgaug.
24. Apthorpe N, Riordan A, Aguilar R, Homann J, Gu Y, Tank D, et al. Automatic
neuron detection in calcium imaging data using convolutional networks. In:
Advances in Neural Information Processing Systems; 2016. p. 3270–3278.
25. Franke F, Quian Quiroga R, Hierlemann A, Obermayer K. Bayes optimal
template matching for spike sorting – combining fisher discriminant analysis with
optimal filtering. Journal of Computational Neuroscience. 2015;38(3):439–459.
doi:10.1007/s10827-015-0547-7.
26. Paige CC, Saunders MA. LSQR: An algorithm for sparse linear equations and
sparse least squares. ACM Transactions on Mathematical Software (TOMS).
1982;8(1):43–71.
27. Victor JD, Purpura KP. Metric-space analysis of spike trains: theory, algorithms
and application. Network: computation in neural systems. 1997;8(2):127–164.
28. Giovannucci A, Friedrich J, Kaufman M, Churchland A, Chklovskii D, Paninski
L, et al. Onacid: Online analysis of calcium imaging data in real time. In:
Advances in Neural Information Processing Systems; 2017. p. 2381–2391.
April 22, 2020 15/15

Cai Plos VolPy PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cai Plos VolPy PDF

Uploaded by

Copyright:

Available Formats

VolPy: automated and scalable analysis pipelines for voltage

1 Joint Department of Biomedical Engineering at University of North Carolina at

April 22, 2020 1/15

cellular-resolution analyses. Voltage indicators have only recently become sensitive 4

enough to generate large-scale recordings at cellular resolution [4–10]. Approaches 5

analysis methods can be violated. For instance, established non-negative matrix 9

modular, and computationally efficient pipeline, to facilitate optimization of algorithms 20

and future development. 21

Algorithms specific for processing voltage imaging datasets are beginning to 22

detection, which detect lower-amplitude spikes using filters generated from 24

easily-detected high-amplitude spikes (SpikePursuit). However, this method requires 25

matrix decomposition to denoise and demix signals, while relying on localNMF to 30

perform an initial segmentation. However, these frameworks do not embed an adaptive 31

validation and comparison of all these approaches. In summary, no validated, complete, 36

To address these shortcomings, we established objective performance evaluation 38

we named VolPy. First, in order to establish a common validation framework and 40

automate neuron segmentation, we created a corpus of annotated datasets with 41

manually segmented neurons. Second, we trained a supervised algorithm to 42

developed an improved algorithm to denoise fluorescence traces and extract spikes, 44

which built upon SpikePursuit [5]. Fourth, we quantitatively evaluated performance of 45

resolution brain imaging analysis. In addition to computational advantages provided by 48

already relying on the CaImAn ecosystem. 50

April 22, 2020 2/15

Creation of a corpus of annotated datasets 52

To date there are no established metrics to compare the practical performance of 53

algorithms for single cell localization and/or segmentation in cellular-resolution voltage 54

developing new supervised algorithms, we generated a corpus of annotated datasets 56

a correlation image and a correlation video as follows: 61

April 22, 2020 3/15

Correlation image: The correlation image is a variation of that implemented in [17]. 64

correlation image by z-scoring when fed to the neural network. 67

Correlation movie: We introduced a novel type of representation method, the 68

the annotated datasets is reported in Tab 1. 78

Table 1. Properties of three heterogeneous types of datasets. For each type of

A novel analysis pipeline for voltage imaging 79

Motion correction and memory mapping 92

simultaneously register frames to a template and creates memory-mapped files. Memory 95

April 22, 2020 4/15

segmentation [16]. Mask R-CNN is a particularly promising architecture as it enables 108

network inference on a validation dataset by VolPy is shown in Fig 1e. 121

We adapted Mask R-CNN to our purpose as follows. We chose a combination of 122

Trace denoising and spike extraction 144

April 22, 2020 5/15

c neuron #1 neuron #2 neuron #3 t

Fig 2. Segmentation, trace denoising and spike extraction framework. (a)

April 22, 2020 6/15

modifications to improve the algorithm. Psuedocode of the algorithm is shown in S1 159

ROI loading and preprocessing: As a result of segmentation, each candidate 161

and high-pass filtered (Yh ) to compensate for photo-bleaching (○ 1○

all pixels in the context region : 171

the ROI region. 173

background pixels Yb : 178

β = (UbT Ub + λb kUb k2F I)−1 UbT t (3)

background only. We added an L2 regularizer to penalize large regression coefficients 183

April 22, 2020 7/15

template z is computed by averaging all the peak waveforms: 195

we convolve with a time-flipped template, which is approximated using the 201

peak-triggered average. 202

w = (YhT Yh + λw kYh k2F I)−1 YhT trec (7)

following iteration: 212

reconstructed fluorescence trace from t. 216

Evaluation of VolPy 217