Professional Documents
Culture Documents
1
Department of Cell Biology and Anatomy, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Abstract
Cryo-electron microscopy is a powerful method for the structure determination of biological macromolecules not
only for single-particle but also for filamentous molecules. In the structural analysis of filaments, it is necessary to
trace filaments from hundreds or thousands of micrographs in order to prepare a particle dataset for the
subsequent three-dimensional reconstruction. Though some fully-automated filament tracers have been developed
and widely used in recent years, they still have a challenge associated with accuracy especially for low signal-to-
noise ratio, heavily contaminated or conformationally heterogeneous datasets. For such non-ideal datasets, manual
tracing is even now one of the favored approaches owing to its interactivity, although it is tedious, time-
consuming and particularly unsuitable for flexible filaments. Here, we introduce an interactive semi-automated
template-free filament tracing tool PyFilamentPicker. Taking advantage of the interactive design and noise-robust
filament detection algorithm, it realized efficient, accurate and well-centered filament tracing regardless of dataset
characteristics.
1. Introduction
Single-particle cryo-electron microscopy (cryo-EM) analysis is a rapidly developing method for determining
structures of biological macromolecules. Sample solution of purified target molecules is applied to a carbon-
coated EM grid, blotted by a filter paper, and vitrified in liquid ethane cooled by liquid nitrogen. Frozen sample
grid is placed on a transmission electron microscope (TEM) stage controlled to liquid nitrogen temperature. Then,
hundreds or thousands of electron micrographs are recorded. Individual images of molecules extracted from cryo-
electron micrographs, called “particles”, are randomly oriented in vitreous ice. Each particle is computationally
Under such a procedure, the first step: particle picking is one of the critical steps for three-dimensional
reconstruction. Since EM images are recorded with low electron dose owing to the easily-damaged nature of
biological specimens, micrographs usually have low signal-to-noise ratio (SNR). Particle picking is a process of
locating numerous particles in such noisy micrographs for the subsequent reconstruction step, which is classically
performed by manual operation. The practice of particle picking differs depending on the feature of a target
molecule, namely, whether the molecule is a “single-particle” or constitutes a “filament”. When the target is a
single-particle molecule like ribosome, particle picking is very simply performed by detecting and extracting
randomly distributed particles out of micrographs. In the manual picking of single-particles, the position of each
particle has to be manually specified, which is extremely tedious and time-consuming. On the other hand, particle
picking for filamentous specimens such as actin and microtubule begins with tracing of each filament so that the
traced filament is segmented into overlapping boxes at constant intervals and extracted as particles. Therefore,
manual picking for filaments starts from manual tracing of filaments by designating the start and end coordinate
pairs, and then particles are extracted at constant intervals on the line segment connecting the start and end points.
For this reason, it is desirable for manual picking that a filament axis resembles a straight line. If a filament is
curved, it has to be divided into several portions so that the particles are well-centered on the filament axis, which
requires much more time and effort. As many particles can be obtained from a filament by a couple of operations,
manual picking for filaments requires relatively less labor to collect the same number of particles compared to
single-particle datasets. However, it is still laborious and time-consuming, too, especially when it comes to
picking from hundreds or thousands of micrographs, or the target filament is highly flexible.
Owing to software and hardware improvements including machine learning algorithm and GPU acceleration, a lot
of tools for semi- or fully-automated particle picking, especially for single-particles, are developed in the last
decade. These tools made it possible to prepare particle datasets in a short time with minimized human
intervention. For filaments, however, very few automated approaches exist at the moment, primarily due to
relatively low demand compared to single particles. RELION has a GPU-accelerated automated particle picker
useful both for single particles and filaments based on template matching algorithm in which cross-correlation of a
micrograph is calculated against pre-calculated template images of the target molecule. MicHelixTrace also adopts
template matching-based approach, but it reduced computational cost by coarsely tiling micrographs and
separating the estimation of rotational and translational parameter, thereby realizing comparable speed to GPU-
Though such a cross-correlation-based approach is popular and widely used for filaments, this approach is error-
prone especially when it runs on non-ideal datasets. In contrast to highly-ordered polymers such as tobacco
mosaic virus, many helical filaments including microtubules and amyloid fibrils are structurally heterogeneous.
For heterogeneous datasets, it is important to prepare particles of one specific conformation excluding particles of
the other conformations, because using mixed particles for three-dimensional reconstruction results in mixed
reconstruction of different conformations, which makes interpretation of the result quite difficult. Though such
heterogeneity can be often easily distinguished by human eye, template-based auto-pickers mostly fail in
discriminating one conformation from the other. Therefore, unless filaments are selected with human eye,
performed subsequent to the particle picking step (Alushin, et al., 2015). In addition, even on homogeneous
datasets, automated tracers tend to be error-prone in some specific situations. For micrographs with low SNR, the
threshold value has to be set lower to detect as much number of true particles as possible, but it naturally increases
the number of false positives in compensation for the improved detection of true positives. In case of heavily
contaminated micrographs, too, template matching algorithm has a tendency to detect false positives. In such non-
ideal cases as listed above, interactive manual picking is favored over the automated picking in terms of accuracy
(xxx, et al., yyy; xxx, et al., yyy, xxx, et al., yyy), though the procedure itself is quite laborious.
Here, we introduce an interactive semi-automated template-free filament tracing tool: PyFilamentPicker, which
enables efficient, accurate and well-centered tracing of filaments. We demonstrate the feasibility of the tracing
approach for various heterogeneous datasets. This tool helps in quickly preparing a particle dataset for three-
filament characteristics.
2. Methods and results
A lot of algorithms have been utilized for automated cryo-EM particle picking for filamentous specimens as image
recognition techniques developed, including classical cross-correlation-based template matching and recent deep
learning-based object detection technique. We placed importance on the following three points in designing a
productive but effortless filament tracer. First, we put emphasis on human intervention to increase the quality of
particle picking. Human eye is helpful in avoiding unfavorable results such as picking junk particles, ignoring true
particles, and tracing a filament with a gap or sidetracking. It also helps tracing filaments with specific
conformational features from a conformationally heterogeneous dataset. Second, time-consuming tasks such as
preparation of a reference or training dataset and parameter optimization should be minimized. Existent software
basically requires template images or a training dataset which are usually prepared by manual picking, although
preparing for them needs time and effort as well. In our framework, based on the idea that filaments can be
regarded as simple “ridges” (bright lines on a dark background), general characteristics of ridge are utilized to
trace filaments, thereby eliminating the need for template images and complicated parameter optimization. Third,
we simplified the program design so that users can intuitively handle the software. With a single mouse click to
specify a filament to be traced, the program automatically traces full length of the filament, even if the filament is
strongly curved.
Unlike fully-automated procedure, our approach needs human intervention to define the point to start searching
from. Filament tracing begins with the estimation of in-plane rotation angle ψ at the user-specified point, followed
by determination of shift ∆ to the filament’s central axis. As user-specified points are usually apart from the exact
center of the filament, starting point is re-calculated to be centered. Then, bidirectional search is performed step-
by-step at regular intervals, calculating ψ and ∆ for every search point, in order to determine exact coordinates on
the axis. Throughout the procedure, all the required calculations are aggregated in only two steps: estimating the
filament’s in-plane rotation angle ψ and distance to the axis ∆ at each search point. In this way, multiple points
are placed on the central axis of the filament. Using them as control points, a spline curve is calculated so that it
passes through the center of the filament. Finally, particles are extracted from the micrograph along the spline
curve at user-specified inter-box distance, and the coordinates of the particles are written to an output file. Of note,
our program pre-processes micrographs to reduce noise and enhance contrast by removing outlier pixels,
As our target can be simply considered ridges, we utilized general features of ridges. Given a ridge of diameter D
and brightness value 1.0 with in-plane rotation angle ψ 0 in zero-value background, we can extract a patch
I 0 ,α (x , y ) of size W × H at extraction angle α (Fig. 1a). Directional derivative of the patches in the x-axis- or
0°-direction is calculated by applying a first-order derivative of anisotropic gaussian kernel (FDAGK). one-
dimensional (1-D) projection p0 , α ( x ) along the y-axis of the extracted patch is defined as:
1
p0 , α ( x ) =
H
∫ I 0 ,α ( x , y ) dy .
(1)
We can calculate the peak-to-peak value D 0 ( α ) of the 1-D projection p0 , α ( x ) for multiple 1-D projections with
different α angles. The peak-to-peak value D 0 (α ) takes the maximum value 1.0 when the extraction angle α is
around the ridge’s true in-plane rotation angle ψ 0. Assuming that the patch I 0 ,α ( x , y ) is corrupted by zero-mean
2
white noise w ( x , y ) with a variance of ϵ 2w , the variance ϵ w 1 D of the noise component w 1 D ( x , y ) on the 1-D
projection is calculated as:
2
ϵw
ϵ w =V [ w1 D (x , y ) ] =
2
.
1D
H
(2)
Therefore, SNR of the 1-D projection of the patch with extraction angle ψ 0 is:
D 0 (ψ 0 ) √ H
S N R1D, ψ = = .
0
ϵw 1D
ϵw
( 3)
Taking this feature of ridge into account, patches are extracted from a cryo-EM micrograph at the initial user-
specified point, changing the extraction angle α from 0° to 180°, and 1-D projections are calculated to estimate
the angle ψ 0. However, as electron micrographs have low SNR, estimating ψ 0 merely by these calculations is still
error-prone. Therefore, we additionally calculated directional derivative of the patches in the x-axis- or 0°-
direction by applying a first-order derivative of anisotropic gaussian kernel (FDAGK), and improved the noise-
^ σ ,θ , ρ ( x , y )=
G
1
2π σ
2
exp
(
−1
2σ
2
[ x y ] M −θ
1 0
[
0 ρ−2
Mθ
] [ ])
x
y
,
( 4)
with
M θ=
[−sinθ
cos θ sin θ
cos θ
,
]
(5)
where σ is the standard deviation or scale of the Gaussian function, θ is the orientation of the kernel and ρ is the
so-called anisotropic factor. AGK of this definition yields deformed isotropic gaussian kernel ρ times elongated in
the major axis (θ+ 90°) direction. The directional derivative of an image I ( x , y ) in the direction of θ using
where νθ is the unit vector in the direction of θ (Fig. 1b). The 0°-directional derivative image I 0 ,ψ 0 ( x , y ) of the
patch I 0 ,α (x , y ) is obtained by applying the 0°-FDAGK filter (Fig. 1c), and its 1-D projection p0 , α ( x )
(hereinafter referred to as “filtered 1-D projection”) is calculated. The peak-to-peak value D 0 (α ) of the filtered 1-
D projection p0 , α ( x ) takes the maximum value at α =ψ 0 (Fig. 1d), which is approximated as:
D0 (ψ 0 )≈
( 7)
√ 21
π σ
.
On the other hand, the variance ϵ 2w of the noise w ( x , y ) on the 0°-directional derivative image I 0 ,α (x , y ) is:
[ ]
( x , y )∗∂ G^ ϵ 2w
ϵ =V [ w(x , y ) ] =V w
°
2 σ,0 ,ρ
w (x , y) = .
∂ ν0 8 π ρσ 4
°
(8)
2
Therefore, the variance ϵ w 1 D of noise component w 1 D ( x , y ) on the filtered 1-D projection p0 , α ( x ) is obtained as
follows:
2 ϵ 2w
ϵw = .
1D
8 πH ρσ 4
(9 )
D 0 (ψ 0 ) 4 √ Hρ σ
SNR 1 D , ψ = = ,
0
ϵw 1D
ϵw
( 10 )
showing that the noise-robustness is determined by the height H of the patch, the anisotropic factor ρ and the
scale σ of the FDAGK. In practice, the height H is set as a user-defined parameter and the kernel size is set as
4 σ × 4 σρ . As too large anisotropic factor and scale are computationally demanding despite improved SNR, we
used ρ=2.0 and σ =20 Å as default values. Using the equation (3) and (10), it is calculated that the SNR of the
1-D projections is increased by ~113 times by the FDAGK-filtering under the default parameter setting.
After estimating the filament’s in-plane rotation angle ψ 0 , distance ∆ 0 to the filament’s central axis is determined.
Taking it into account that the 1-D projection p0 , ψ0 (x ) should take the maximum and minimum value at the left
and right edge of the filament respectively, the program calculates the cross-correlation C C 0 (δx) of the filtered
1-D projection p0 , ψ0 ( x ) against its point-symmetric image − p0 , ψ0 (−x ) with respect to the origin (Fig. 2a):
( 11 )
and ∆ 0 is determined to be the δx value which gives the global maximum of C C 0 ( δx ) within the range of
−r < δx <r , where r is the filament radius. In practice, it is important to limit the shift search range within the
filament radius in order not to detect wrong peak derived from contamination or adjacent filaments. Based on the
angle ψ 0 , the shift value ∆ 0 and the coordinates of the initial user-specified point, the coordinates x 0 of the search
starting point is determined to be on the exact center of the filament. For the subsequent filament search step, the
filtered 1-D projection p0 , ψ0 ( x−∆0 ) shifted to the center is stored and called as “reference 1-D projection”.
Bidirectional filament search is performed from the initial point x 0 step-by-step at regular intervals, calculating
the in-plane rotation angle and distance to the center for every search point. Given the coordinates x n and the in-
plane rotation angle ψ n of the filament at the n-th search point, coordinates x n+1 of (n+1)-th search point is
x n+1=x n +d
[ ]cos ψ n
sinψ n
,
( 12 )
where d is the interval of search. Filament patches are extracted at the point x n+1 with several extraction angles at
equal intervals in range of ψ n ± local angular search range β , and their directional derivative images are
calculated using 0°-FDAGK (Fig. 2b). For each directional derivative image, 1-D projection and its peak-to-peak
value are calculated. The (n+1)-th in-plane rotation angle ψ n+1 is determined as the extraction angle which gives
the largest peak-to-peak value. Using ψ n+1 as the extraction angle, a new filament patch is extracted and filtered
by 0°-FDAGK. In order to determine shift ∆ n+1 to the filament axis, the cross-correlation C C n+1 (δx) of the
filtered 1-D projection pn +1, ψn +1 ( x ) against the reference 1-D projection p0 , ψ0 (x−∆0 ) is calculated as follows:
( 13 )
and shift value ∆ n+1 is determined as the δx value which gives the maximum value of C C n+1 ( δx ) , limiting the
search range within ± d sin β (Fig. 2c). Based on the angle ψ n+1 , shift value ∆ n+1 and the estimated coordinates
x n+1, the coordinates x n+1 is re-calculated to be exactly on the central axis of the filament.
The program continues to search until the maximum cross-correlation value C C n+1 ( ∆ n+1 ) falls below the
threshold R ×C C 0 (∆0) , where R is a user-specified value reflecting tracing sensitivity. When no filament exists
on the search point, the cross-correlation value C C n+1 ( ∆ n+1 ) becomes close to zero and further search is ceased at
this point. On the other hand, filament search does not stop at a crossing of two filaments, because the cross-
correlation value does not decrease in the presence of another filament oblique to the filament being searched
(Fig. 2d). This results in tracing filaments including the points of filament overlapping, or branch points. In the
existent auto-picking software, branch points are removed in principle. However, as such points can be eliminated
from the particle dataset by a subsequent image analysis such as two-dimensional classification, our program does
not take care of branch point removal. In addition, this feature reduces the number of manual operation on datasets
with intricately overlapping filaments.
As the goal of the program is to extract particle images along the filament axis at the user-specified inter-box
distance which is usually different from the filament search interval d , a curve passing through the filament axis
has to be calculated. We can calculate a spline curve using the points obtained by the bidirectional filament search
as control points, but such a curve is often deviated away from the exact axis, especially when the filament is
meandering. The degree of the deviation is well visualized using a “shrink image” which is the shortened image of
the filament straightened along the spline curve (Fig. 3a). In order to obtain a curve precisely on the axis, the
program calculates quadratic spline curves for multiple short segments of the filament and combines them into
one continuous curve (Fig. 3b). The shrink image is shown on the software interface to help users evaluate
whether the calculated curve along which particles are extracted is well-centered. It is also useful for selecting
filaments with a specific conformational feature, e.g. microtubules of a specific protofilament number, in a
conformationally heterogeneous dataset (Fig. 3c). In addition, it can be used to manually remove unnecessary
portions of the filament, in cases such as when a certain region of the filament needs to be saved, or the traced
To save the particle picking results, the program extracts particle images along each filament axis at the inter-box
distance to save them in a stack file, and outputs their coordinates to a text file. For each filament, the particle
stack file is created in the MRCS format, and the coordinates are saved in the EMAN’s BOX format or RELION’s
STAR format. For a conformationally heterogeneous dataset composed of multiple different filament types,
PyFilamentPicker has a function to label each filament type at the time of saving. This function enables collecting
and saving particles of different conformations at the same time.
We first tested PyFilamentPicker on two datasets which we named dataset #1 and #2. Dataset #1 is composed of
GMPCPP-stabilized microtubules decorated with microtubule binding domain (MTBD) of cytoplasmic dynein,
consisting of microtubules of different protofilament number, from which we only needed 14-profoilament
filaments. Though tracing microtubule filaments of a specific protofilament number is still difficult for automated
tracers, our program made it possible to specifically select 14-protofilament microtubules by referring to the moiré
patterns of shrink images. We selected 35,636 particles from 621 micrographs, and obtained asymmetric (C1)
reconstruction of MTBD-microtubule complex from the final 32,666 particles whose seam location was
consistently determined (Fig. 4a and b, the detail for the structural analysis is described in our previous paper)
(Ref). The 14-protofilament microtubule structure was well resolved including the seam location, suggesting the
usefulness of our interactive approach. On the other hand, dataset #2 is a structurally homogeneous dataset of
actin filaments. On this dataset, our approach was successful in tracing filaments even from ice-contaminated low-
SNR micrographs recorded with low defocus values (Fig. 4d). The resultant 241,111 particles could produce high-
resolution two-dimensional class averages in which secondary structures were clearly resolved (Fig. 4e). Taken
together, our program worked reliably on both datasets including non-ideal micrographs.
We further tested our program on amyloid datasets (dataset #3 and #4). As amyloid fibrils are formed of laterally
associated thin substructures, their EM images have unique features, i.e. the width of a fibril is not constant along
its length and the SNR tends to be relatively low, which make automated particle picking quite difficult. In
addition, amyloid datasets are often the mixtures of multiple different conformations. Such characteristics of
amyloid are unsuitable for automated tracing, and therefore manual tracing tools are generally utilized for amyloid
datasets (Ref). Dataset #3 is a dataset of tau amyloid fibrils from Alzheimer’s disease brain (EMPIAR-10230),
containing two conformationally different filament types, namely paired helical fibrils (PHFs) and straight fibrils
(SFs). Dataset #4 is composed of heparin-induced 2N4R filaments (EMPIAR-10243) including four types of
filaments of different conformations, called snake, twister, hose and jagged in the original article. As the fibrils in
dataset #3 had relatively low contrast, we used larger FDAGK standard deviation σ and larger patch height H
(Table 1). For dataset #4, smaller filament search interval d and larger local angular search range β are used
considering the existence of very flexible filaments (Supplementary Fig. 1, Table 1). For both datasets,
PyFilamentPicker enabled tracing of every type of the fibrils while clearly discriminating one conformation from
the other with the help of shrink images (Fig. 5a-d). These results demonstrate that our tracing approach is
applicable for amyloid datasets. The parameters used to trace the filaments are summarized in Table 1 and the
We compared PyFilamentPicker with RELION’s manual picker (Ref), using dataset #2. Whereas
PyFilamentPicker is designed to be able to trace even a strongly curved filament with a single operation, the
manual picker assumes a filament resembles a straight line and curved filaments need to be segmented into
multiple straight portions in order to prepare well-centered particles. Therefore, when tracing with the manual
picker, we roughly segmented every filament into 150~300-nm segments. Not surprisingly, the required time for
particle picking was considerably shortened by using PyFilamentPicker. The total number of picked particles from
157 micrographs using the manual picker and PyFilamentPicker were 241,111 and 185,416, respectively. For each
particle dataset, we performed two-dimensional classification to determine shift distance to center each particle
which was estimated as the offset distance in the y-axis direction. The mean values of the shift were 4.97 and bbb,
respectively, demonstrating that the particles picked using PyFilamentPicker were well-centered. Furthermore,
this high localization accuracy of PyFilamentPicker contributed to faster convergence of the two-dimensional
We then evaluated the accuracy of our approach in comparison with RELION auto-picker using dataset 2. For the
automated particle picking using RELION, we manually collected 1,000 particles from 10 micrographs to prepare
3. Discussion
filament tracing tool. Assuming ridge-like features of filaments, this tool calculates the directional derivative
images using first-order derivative of anisotropic gaussian kernel (FDAGK) to detect the in-plane rotation angle
and the central axis of a filament, enhancing noise-robustness by over a hundred times. With a single mouse click
to specify a filament, it traces full length of the filament even from low signal-to-noise ratio, heavily contaminated
or conformationally heterogeneous datasets. Though some fully-automated filament tracers have been developed
in recent years, interactive manual picking is still commonly performed for such non-ideal datasets as the
substitute for automated approaches to prepare a full particle dataset. However, manual picking for flexible
filaments is particularly laborious, as the existent manual tracers assume target filaments resemble straight lines.
Against this background, PyFilamentPicker can take the place of the conventional manual picking. Our tool
successfully traced straight to very flexible filaments and output well-centered particles, even from non-ideal
datasets owing to its interactivity. It was shown that our approach can be utilized for amyloid datasets which have
been considered both laborious for manual-pickers and difficult for auto-pickers.
Furthermore, even if the target dataset is ideal for auto-pickers, our tool is quite helpful in preparing an initial
template dataset, which has been commonly performed by time-consuming manual tracing. Recently, SPHIRE has
released a new fully-automated particle picker crYOLO based on a deep learning-based object detection system
You Only Look Once (YOLO), which is applicable both for single particles and filaments. Though crYOLO
greatly improved the accuracy of filament detection compared to template-based approaches, it requires a dataset
to train its network, which is still prepared by laborious manual tracing of hundreds of filaments.
PyFilamentPicker may help with the preparation of training datasets for such deep learning-based filament tracers.
Table 1.
Binning factor 4 4 4 4
Table 2.