Review of Multi-Omics Integration in The Age of Million Single-Cell Data

The document (Multi-omics integration in the age of million single-cell data) reviews the current
state and challenges in integrating multi-omics data in single-cell biology. The key points from the
document include:
Advancements in Single-Cell Technologies: Single-cell technologies have significantly advanced,

revealing diverse cell types and novel cell-state associations. These techniques have extended
beyond transcriptome analyses to multi-omics approaches, enabling simultaneous measurement
of different data modalities and spatial cellular context. The challenge lies in integrating complex
multimodal data into coherent biological models .
Challenges of Multimodal Datasets: Multimodal datasets offer unprecedented insights into

biological systems but pose challenges due to modality-specific technical issues and the need for
common inference from varied information types .
Biological Data Integration: This term describes analytic methods combining information from
multiple sources into single biological inferences. It's particularly relevant in integrating
large-scale omics data at the single-cell level .
Integration Strategies: The document categorizes integration problems into those involving
matched data (measured on the same cell) and unmatched data (measured on different cells). It
discusses various strategies for each type, including joint latent space inference and consensus of
individual inferences for matched data, and annotated group matching for unmatched data .
Quantitative Causal Modeling: This approach considers actual biological processes generating
measurements, like the relationship between chromatin states, RNA levels, and protein levels.
Computational tools like RNA velocity and protaccel extend this model to integrate various data
types into a single inference .
Statistical Modelling: Another method for data integration is to relate different measurement
modalities with a statistical model. For example, a model could relate RNA levels to protein levels
or chromatin state to RNA levels .
Latent Space Modelling: This approach uses mathematical functions to model data from different
modalities as aspects of an abstract 'latent molecular state' of a cell. It assumes that the latent
space determines the observed multimodal values .
Late Integration Methods: Late integration doesn't attempt to relate measurements but uses
each data modality to infer a model unique to that data type. These models or results are then
integrated, for instance, creating a consensus network from independent gene regulatory
networks inferred from transcriptome and proteome data .
Joint Profiling of Multi-omics Data: Techniques like joint snRNA-seq and snATAC-seq are used for
matched data analysis. These approaches measure single-cell phenotypes along with
transcriptomic data, adding an important dimension to profiling .
Naive Approaches for Integration: Basic methods to integrate matched multimodal data involve
transforming the data to have homogeneous statistical characteristics, though this often ignores
the biological context of different modalities .
Single-cell Aggregation and Integration (scAI): This tool uses a common latent space approach to
analyze heterogeneity in joint transcriptome and epigenome profiling data, addressing the
challenges posed by sparse epigenomic information .
Approaches for Independent Multimodal Data Integration: The integration of independently

collected datasets (unmatched data) is a significant challenge. Techniques aim to match groups of
cells at the level of distinct cell types or local ensembles, or statistically map one feature space to
another .
Matching by Annotated Cell Groups: This involves matching groups of cells between modalities,
either manually or using biologically informative features like the proximity of open chromatin to
expressed genes .
Matching with Shared Feature Sets: Methods like STvEA use common molecular bases, such as
protein abundance measurements, for feature set matching across different modalities .
Feature Conversion and Calibration: In the absence of a common molecular basis, one modality's
measurements may be connected to features of another using statistical models. Good
calibration after feature conversion is crucial for successful matching .
This comprehensive overview encapsulates the current landscape of single-cell multi-omics data
integration, highlighting various methodologies, their applications, and the challenges inherent in
this rapidly evolving field.

Review of Multi-Omics Integration in The Age of Million Single-Cell Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Review of Multi-Omics Integration in The Age of Million Single-Cell Data

Uploaded by

Copyright:

Available Formats

The document (Multi-omics integration in the age of million single-cell data) reviews the current

Advancements in Single-Cell Technologies: Single-cell technologies have significantly advanced,

Challenges of Multimodal Datasets: Multimodal datasets offer unprecedented insights into

Approaches for Independent Multimodal Data Integration: The integration of independently

You might also like