You are on page 1of 3

Work / Technology & tools

AKOS DIOSDI — IMAGING; TIMEA TOTH & FERENC KOVACS — ANALYSIS;


ANDRAS KRISTON — VISUALIZATION; BIOMAG GROUP, HUN-REN/SZBK

Cancer-cell nuclei (green boxes) picked out by software using deep learning.

TEASING IMAGES
APART, CELL BY CELL
Deep learning is powering the evolution of algorithms that can make sense of
imagery from a range of microscopy techniques. By Michael Eisenstein

A
ny biology student can pick a neuron in Heidelberg, Germany; algorithms must next six months fixing the mistakes of existing
out of a photograph. Training a com- learn it from first principles. “Mimicking segmentation algorithms”.
puter to do the same thing is much human vision is very hard,” she says. The good news is that the tide is turning,
harder. Jan Funke, a computational But in life-science research, it’s increasingly particularly as computational biologists tap
biologist at the Howard Hughes required. As the scale and complexity of bio- into the algorithmic architectures known as
Medical Institute’s Janelia Research Campus logical imaging experiments has grown, so too deep learning, unlocking capabilities that
in Ashburn, Virginia, recalls his first attempt has the need for computational tools that can drastically accelerate the process. “I think
14 years ago. “I was arrogant, and I was think- segment cellular and subcellular features with segmentation overall will be solved within the
ing, ‘it can’t be too hard to write an algorithm minimal human intervention. This is a big ask. foreseeable future,” Kreshuk says. But the field
that does it for us’,” he says. “Boy, was I wrong.” Biological objects can assume a dizzying array must also find ways to extend these methods
People learn early in life how to ‘segment’ of shapes, and be imaged in myriad ways. As a to accommodate the unstoppable evolution
visual information — distinguishing individual result, says David Van Valen, a systems biolo- of cutting-edge imaging techniques.
objects even when they happen to be crowded gist at the California Institute of Technology
together or overlapping. But our brains have in Pasadena, it can take much longer to ana- A learning experience
evolved to excel at this skill over millions of lyse a data set than to collect it. Until quite The early days of computer-assisted segmen-
years, says Anna Kreshuk, a computer scientist recently, he says, his colleagues might collect tation required considerable hand-holding by
at the European Molecular Biology Laboratory a data set in one month, “and then spend the biologists. For each experiment, researchers

Nature | Vol 623 | 30 November 2023 | 1095


Work / Technology & tools
would have to carefully customize their of the surrounding cytoplasm. example, out-of-focus images — that teach
algorithms so that they could identify the CellPose takes a more generalist approach. the algorithm to overcome such problems in
boundaries between cells in a particular Developed by Marius Pachitariu, a com- real data.
specimen. putational neuroscientist at Janelia, and Another increasingly popular strategy is to
Image-analysis tools such as CellProfiler, co-principal investigator Carsen Stringer in let algorithms do bulk annotation and then
developed by imaging scientist Anne Carpenter 2020, the software derives ‘flow fields’ that bring humans in to fact-check. Van Valen and
and computer scientist Thouis Jones at describe the intracellular diffusion of the his colleagues used this ‘human in the loop’
the Broad Institute of MIT and Harvard in molecular labels commonly used in light approach to develop the TissueNet image
Cambridge, Massachusetts, and ilastik, devel- microscopy. “We came up with this representa- data set, which contains more than one mil-
oped by Kreshuk and her colleagues, simplify tion of cells that’s basically described by these lion annotated cell-nucleus pairs. They tasked
this process with machine learning. Users edu- vector dynamics, with these flow fields kind a crowdsourced community of novices and
cate their software by example, marking up of pushing all the pixels towards the cen- experts with correcting predictions by a
demonstration images and creating precedents tre of the cell,” says Pachitariu. This allows deep-learning model that was trained on just
for the software to follow. “You just point on CellPose to confidently assign each pixel in 80 manually annotated images. Van Valen’s
the images and say, this is what I want,” Kreshuk a given image to one cell or another with high team subsequently developed a segmenta-
explains. That strategy is limited in its generaliz- accuracy — and more importantly, with broad tion algorithm called Mesmer, and showed
ability, however, because each training process applicability across light microscopy methods that this could match human segmentation
is optimized for a particular experiment — for and sample types. performance after training it with TissueNet
example, detecting mouse liver cells labelled data2.Pachitariu and Stringer likewise used
with a specific fluorescent dye. “Annotating 3D data this human-in-the-loop approach to retrain
Deep learning produced a seismic shift in CellPose to achieve better performance on
this regard. The term describes algorithms that
is such a pain — this is specific data sets3, using as few as 100 cells.
use neural network architectures that loosely where people go to cry.” “On a new data set, we think a user can proba-
mimic the organization of the brain, and that bly train their model in the loop within an hour
can extrapolate sophisticated patterns after or so,” Pachitariu says.
training with large volumes of information. “One of the magic things about CellPose Even so, retraining for new tasks can be a
Applied to image data, these algorithms can was that … even with cells that are touching, chore. To streamline the process, Kreshuk
derive a more robust and consistent definition it can split them apart just fine,” says Beth and Florian Jug, a computational biologist
of the features that represent cells and other Cimini, a bioimaging specialist at the Broad at the Human Technopole Foundation in
biological objects — not just in a given set of who currently runs the CellProfiler project. Milan, Italy, have created the BioImage Model
images but across multiple contexts. Zoo, a community repository of pre-trained
A deep learning framework known as U-Net, Basic training deep-learning models. Users can search it
developed in 2015 by Olaf Ronneberger, a com- These steady gains in deep-learning-based to find a ready-to-use model for segmenting
puter scientist at the University of Freiburg segmentation stem only partly from advances images, rather than struggling to train their
in Germany, and his colleagues, has proved in algorithm design — most methods still riff on own. “We are trying to make the zoo really usa-
particularly transformative. Indeed, it remains the same underlying U-Net foundation. ble by biologists who are not very deep into
the underlying architecture behind most seg- Instead, the key determinant of success is deep learning,” says Kreshuk.
mentation tools almost a decade later. training. “Better data, better labels — that’s the Indeed, the unfamiliarity of many wet-lab
Many initial efforts in this space focused secret,” says Van Valen, who led the develop- scientists with the intricacies of deep-learning
on identifying cell nuclei. These are large ment of a popular segmentation tool, called algorithms is a broad roadblock to deploy-
and ovoid, with little variation in appearance DeepCell. Labelling entails assembling a ment, says Cimini. But there are many avenues
across cell types, and virtually every mamma- collection of microscopy images, deline- to accessibility. For example, she attributes
lian cell contains one. But they can still pose a ating the nuclei and membranes and other CellPose’s success to its straightforward graph-
challenge in cell-dense tissue samples, says structures of interest, and feeding those ical user interface as well as its segmentation
Martin Weigert, a bioimaging specialist at annotations into the software so that it can capabilities. “Putting the effort in to make stuff
the Swiss Federal Institute of Technology in learn the features that define those elements. very friendly and approachable and non-scary
Lausanne. “You can have very tightly packed For CellPose, Pachitariu and Stringer spent is how you reach the majority of biologists,”
nuclei. You don’t want a segmentation method half a year collecting and curating as many she says. Many algorithms, including CellPose,
to just say it’s a gigantic blob,” he says. In 2019, microscopy images as they could to build a StarDist and nucleAIzer, are also available as
a team led by Peter Horvath, an imaging spe- large and broadly representative training data plug-ins for popular image-analysis tools
cialist at the Biological Research Centre in set — including non-cell images to provide including ImageJ/Fiji, napari (Nature 600,
Szeged, Hungary, used U-Net to develop an clear counterexamples. 347–348; 2021) and CellProfiler.
algorithm called nucleAIzer. It performed But building a vast, manually annotated
better than hundreds of other tools that had training set can quickly become overwhelm- Pushing the boundaries
competed in a ‘Data Science Bowl’ challenge ing, and deep-learning experts are developing Just a few years ago, Cimini says, she never
for nuclear segmentation in light microscopy strategies to work smarter rather than harder. imagined such rapid progress. “At least for
the year before1. Diversity is one priority. “Having a few of a nuclei — and probably even for cells, too —
Even if software can find the nucleus, it lot of different things is better than having very we’re actually getting to the point that within a
remains tricky to extrapolate the shape of much of the same,” says Weigert. For exam- few years, this is going to be a solved problem.”
the rest of the cell. Other algorithms aim for a ple, a collection of images of brain, muscle and Researchers are also making headway
more holistic strategy. For example, StarDist, liver tissue with a variety of staining and label- with more-challenging types of images. For
developed by Weigert and his collaborator ling approaches is more likely than are images example, many spatial transcriptomics stud-
Uwe Schmidt, generates star-shaped poly- of just one tissue type to produce results that ies entail multiple rounds of tissue labelling
gons that can be used to segment nuclei while generalize across experiments. Horvath also and imaging, in which each label or collec-
also extrapolating the more complex shape sees value in including imperfections — for tion of labels reveals the RNA transcripts for

1096 | Nature | Vol 623 | 30 November 2023


a specific gene. These are then reconstructed Wang is bullish about ‘foundation models’
alongside images of the cells themselves to that can truly generalize across imaging data
create tissue-wide gene-expression profiles formats, and helped to coordinate a data
with cellular resolution. challenge at last year’s Conference on Neural
But identification and interpretation of Information Processing Systems (NeurIPS) for
gene-expression ‘spots’ is tough to automate. various groups to test their mettle in develop-
“When you actually open these raw images, ing solutions.
you’ll see that there are just far too many spots In addition to bigger and broader train-
for human beings to ever manually label,” ing sets, such models will almost certainly
says Van Valen. This, in turn, makes training require computational architectures beyond
difficult. Van Valen’s team has developed a the comfortable confines of U-Net. Wang is
deep-learning network that can confidently enthusiastic about transformers — algorith-
discern these spots with help from a classical mic tools that make it easier for deep learning
computer-vision algorithm4. The research- to discern subtle but important patterns in
ers then integrated this network into a larger data. Transformers are a central component
pipeline called Polaris, a generalizable solu- of both the large language model ChatGPT
tion that can be applied for end-to-end analysis and the protein-structure-predicting algo-
of a wide range of spatial transcriptomics rithm AlphaFold, and the winning algorithm
experiments. at the 2022 NeurIPS challenge leveraged
By contrast, analysis of 3D volumes for light transformers to achieve a decisive advantage
microscopy remains stubbornly difficult. over other approaches. “This helps the model
Weigert says there is a profound shortage of focus on relevant cellular or tissue structures
publicly available 3D imaging data, and it is a while ignoring some of the noise,” Wang says.
chore to make such data useful for algorithmic Numerous groups are now developing foun-
CARSEN STRINGER & CELL IMAGE LIBRARY (CC-BY)

training. “Annotating 3D data is such a pain — dation models; this month, for example, Van
this is where people go to cry,” says Weigert. Valen and his team posted a preprint describ-
The variability in data quality and formats is ing their CellSAM algorithm6. Wang is optimis-
also more extreme than for 2D microscopy, tic that first-generation solutions will emerge
Pachitariu notes, necessitating larger and in the next few years.
more-complicated training data sets. In the meantime, many researchers are
There has, however, been remarkable moving on to more-interesting applications
progress in segmentation of 3D data generated of the tools they’ve developed. For example,
by ‘volume electron microscopy’ methods. But Funke is using segmentation-derived insights
interpreting electron micrographs throws up to classify the functional characteristics of
fresh challenges. “Whereas in light microscopy Cells segmented by the CellPose software neurons based on their morphology — dis-
you have to learn what is signal and what is (top) and the ‘flow fields’ that it used to do cerning features of inhibitory versus excita-
background, in electron microscopy you have the task (bottom). tory cells in connectomic maps, for instance.
to learn to distinguish what makes your signal And Horvath’s team collaborated on a method
different from all the other kinds of signal,” Segmentation algorithms for connectomics called deep visual proteomics, which lever-
says Kreshuk. Volume electron microscopy are now largely mature, Funke says — although ages structural and functional insights from
heightens this challenge, requiring recon- fact-checking these circuit diagrams at whole- deep-learning algorithms to delineate spe-
struction of a series of thin sample sections brain scale remains a daunting task. “The part cific cells in tissue samples that can then be
that document cells and their environments that we’re nervous about now is, how do we precisely plucked out and subjected to deep
with remarkable detail and resolution. proofread?” transcriptomic and proteomic analysis7. This
These capabilities are particularly impor- could offer a powerful tool for profiling the
tant in connectomics studies that seek to A singular solution molecular pathology of cancers and identify-
generate neuronal ‘wiring maps’ of the brain. Interoperability across imaging platforms ing appropriate avenues for treatment.
Here, the stakes are especially high in terms also remains a challenge. An algorithm These prospects excite Kreshuk as well. “I
of accuracy. “If we make, on average, one mis- trained on samples labelled with the haema- hope in the near future we can make morphol-
take per micron of a neural fibre or per axon toxylin and eosin stains commonly used in ogy space really quantitative,” she says, “and
length, then the whole thing is useless,” says histology might not perform well on confocal analyse it together with the omics space and
Funke. “This is one of the hardest problems microscopy images, for example. Similarly, see how we can mix and match this stuff.”
you can have.” But the vast volumes of data methods designed for segmentation in elec-
also mean that algorithms need to be efficient tron microscopy are generally incompatible Michael Eisenstein is a science writer in
to complete reconstruction in a reasonable with light microscopy data. Philadelphia, Pennsylvania.
time frame. “For each technology, they capture biolog-
As with light microscopy, U-Net has deliv- ical specimens at significantly varying scales 1. Hollandi, R. et al. Cell Syst. 10, 453–458 (2020).
2. Greenwald, N. F. et al. Nature Biotechnol. 40, 555–565
ered major dividends. In a preprint posted in and emphasize distinct features due to the dif- (2022).
June, the FlyWire Consortium — of which Funke ferent staining, different treatment protocols,” 3. Pachitariu, M. & Stringer, C. Nature Methods 19, 1634–1641
is a member — described applying a U-Net- says Bo Wang, an artificial-intelligence special- (2022).
4. Laubscher, E. et al. Preprint at bioRxiv
based algorithm to reconstruct the wiring of ist at the University of Toronto in Canada. “So, https://doi.org/10.1101/2023.09.03.556122 (2023).
an adult fly brain, comprising roughly 130,000 when you think about designing or training a 5. Dorkenwald, S. et al. Preprint at bioRxiv
https://doi.org/10.1101/2023.06.27.546656 (2023).
neurons5. An assessment of 826 randomly cho- deep-learning model that works across the full
6. Israel, U. et al. Preprint at bioRxiv
sen neurons found that the algorithm achieves spectrum of these methods, that means really https://doi.org/10.1101/2023.11.17.567630 (2023).
99.2% accuracy relative to human evaluators. simultaneously excelling in different tasks.” 7. Mund, A. et al. Nature Biotechnol. 40, 1231–1240 (2022).

Nature | Vol 623 | 30 November 2023 | 1097

You might also like