You are on page 1of 49

scRNA-seq

methods and
data analysis
A brief introduction… by
PhD, Silvia Giulia Galfrè
History

Molecular Biology and Genomics – a.a. 2021-2022


• Taeg et al. 2009

• Early developmental stage


embryo: just few cells

• Number of publications last


year: around 30.000

• Different methods: more than


50…

From (Svensson, et al., 2018) with some integration and modification, Plot of some important scRNA-
seq experiment reporting the analyzed cell number and the year.

3
La Sapienza University
Cell atlases

Molecular Biology and Genomics – a.a. 2021-2022


• Now there is a great effort to construct
different comprehensive single cell atlases
• Examples:
• Human cell atlas
https://www.humancellatlas.org/
• Mouse Tabula Muris
https://tabula-muris-senis.ds.czbiohub.org/
• Mouse cell atlas: >520,000 single cells from
>10 Mouse tissues -
http://bis.zju.edu.cn/MCA/index.html
• Drosophila Atlas: https://flycellatlas.org/
• … many more for tissues

4
La Sapienza University
From
Why single cell?

Molecular Biology and Genomics – a.a. 2021-2022


Bulk RNA-seq scRNA-seq
• Precise single cell
• Average
information
information
• Identification of
• Information for
rare cell types
rare cell types
lost • More difficult
• Easier to obtain • More expensive
• Cheaper • Many methods
• Quite standard • No standard
workflow workflow

5
La Sapienza University
General workflow
1. Cell isolation 2. Library preparation

3. Data analysis
There are many methods...

Molecular Biology and Genomics – a.a. 2021-2022


• They differ for cell isolation
protocols
• Reaction “chamber”
• And library preparation
protocols
• Notice that FACS-sorting can be
used before other protocols like
Fluidigm or micro-fluidic
protocols, or alone to isolate and
sort cells.

7
La Sapienza University
Molecular Biology and Genomics – a.a. 2021-2022
In wells methods
Advantage: ease the control of the reactions
Drawback: small amount of cells

8
La Sapienza University
Fluidigm C1

Molecular Biology and Genomics – a.a. 2021-2022


Complete automated commercial technology

• analyze only 96
cells each round 1
• full-length RNA
sequences
• SMARTer® First, the primer poly-T hybridizes with the mRNA poly-A tail and the reverse
chemistry for cDNA transcriptase can produce the cDNA.
synthesis (Gong, et
al., 2018)

9
La Sapienza University
Fluidigm C1

Molecular Biology and Genomics – a.a. 2021-2022


Complete automated commercial technology

• analyze only 96
cells each round 1
• full-length RNA
sequences
2
• SMARTer®
chemistry for cDNA
synthesis (Gong, et When the RNA end is reached, the enzyme’s terminal transferase activity adds some
al., 2018) deoxycytidines. These can base pair with the poly-G of another primer present in
solution that can be used for a second round of polymerisation.

10
La Sapienza University
Fluidigm C1

Molecular Biology and Genomics – a.a. 2021-2022


Complete automated commercial technology

• analyze only 96
cells each round 1
• full-length RNA
sequences
2
• SMARTer®
chemistry for cDNA
synthesis (Gong, et 3
al., 2018)
The yellow primers are used for the subsequent PCR amplification.

11
La Sapienza University
Fluidigm C1

Molecular Biology and Genomics – a.a. 2021-2022


Complete automated commercial technology

Tagmentation step
This is necessary to break
1
the cDNA keeping the
positional information of
the different pieces. This
is done through a 2
genetically modified Tn5
enzyme that can cut a 3
double stranded DNA
inserting at both sides a
specific adapter.

Then all cDNA is pooled 4


and sequenced.
12
La Sapienza University
SMARTer total RNA

Molecular Biology and Genomics – a.a. 2021-2022


Fluidigm C1 instruments

• derived from the previous


• library preparation after removing the rRNA
(or not)
• full-length sequencing is recovered by using
random primers for template switching
• polyadenylated and non-polyadenylated
RNA molecules
• preserve strand-of-origin information

13
La Sapienza University From https://www.takarabio.com/
Smart-seq3

Molecular Biology and Genomics – a.a. 2021-2022


• Poly-A+ method
• Full length
• This method allows to do a direct allelic
RNA counting trough SNPs detection
• UMI inserted in a set of barcodes added at
the 5’-end

From Hagemann-Jensen, et al., 2020

14
La Sapienza University
UMI

Molecular Biology and Genomics – a.a. 2021-2022


RNA
molecules
Unique Molecular Identifier

• First introduced in 2014


cDNA • They are short sequences used to uniquely
formation tag each molecule in a sample library
• Fundamental to count each molecule once
UMI
without bias from PCR
After PCR

1 molecule each gene


Deduplication

2 molecules
Islam, S., Zeisel, A., Joost, S. et al. Quantitative single-cell RNA-seq
with unique molecular identifiers. Nat Methods 11, 163–166 (2014).

15
La Sapienza University
SPLiT-seq

Molecular Biology and Genomics – a.a. 2021-2022


Split-pool barcoding

• a suspension of formaldehyde fixed cells or


nuclei through four rounds of combinatorial
barcoding

• third-round barcode -> also UMI

• Four rounds of combinatorial barcoding can


yield 21,233,664 barcode combinations (three
rounds of barcoding in 96-well plates followed
by a fourth round with 24 PCR reactions)

16
La Sapienza University
Rosenberg, Alexander B., et al Science 360.6385 - 2018
SPLiT-seq

Molecular Biology and Genomics – a.a. 2021-2022


• 156.049 single-nucleus transcriptomes
• P2 and P11 mouse brains and spinal cords
• Around100 cell types were identified

17
La Sapienza University Rosenberg, Alexander B., et al Science 360.6385 - 2018
Molecular Biology and Genomics – a.a. 2021-2022
Insert or Drag and Drop your Image

Droplet based
Advantages: high number of cells, not
expensive
Drawback: not full length, poly A
18
La Sapienza University
General features

Molecular Biology and Genomics – a.a. 2021-2022


• All high-throughput: from hundreds to thousands of cells
• Bead plays a fundamental role: primers are 5’-end linked to the bead with a cleavable linker
• Each primer contains a PCR primer, a cell barcode (the same for all the primers on the bead
but different between different beads), a random UMI and a poly-T sequence
• Water-oil emulsion: each aqueous drop will contains a bead and a single cell (or nothing, or
just a bead).

Zhang, Xiannian, et al. "Comparative analysis of droplet-based ultra-high-throughput single- 19


cell RNA-seq systems." Molecular cell 73.1 (2019): 130-142.
La Sapienza University
General steps

Molecular Biology and Genomics – a.a. 2021-2022


• Inside the drop, the cell lyses and the mRNA poly-A tails
hybridize with the poly-T tail of the primers attached to the
bead.

Macosko, et al., 2015


20
La Sapienza University
General steps

Molecular Biology and Genomics – a.a. 2021-2022


• Inside the drop, the cell lyses and the mRNA poly-A tails
hybridize with the poly-T tail of the primers attached to the
bead.
• Then, reverse transcription occurs, and all the cDNAs are
pooled and sequenced.

Macosko, et al., 2015


21
La Sapienza University
General steps

Molecular Biology and Genomics – a.a. 2021-2022


• Inside the drop, the cell lyses and the mRNA poly-A tails
hybridize with the poly-T tail of the primers attached to the bead.
• Then, reverse transcription occurs, and all the cDNAs are pooled
and sequenced.
• All sequences coming from different cells are mixed together, but
thanks to the cell barcode inserted, it is possible to demultiplex
the data, separating the sequences coming from different cells.

Macosko, et al., 2015


Zhang, Xiannian, et al. "Comparative analysis of droplet-based ultra-
22
high-throughput single-cell RNA-seq systems." Molecular cell 73.1
La Sapienza University (2019): 130-142.
In Drop

Molecular Biology and Genomics – a.a. 2021-2022


Klein, et al., 2015 – citations 2229
Barcoded
• Beads: hydrogel Primer
Bead
• Link between primers and bead: UV
cleavable
• Reverse transcription: directly inside each
drop
• cDNA amplification (it uses the CEL-seq Droplet
protocol with a linear in vitro transcription) Generation
quite time-consuming.

Reaction in
Droplets 2.5h

Reaction after 28h 23


Demulsification
La Sapienza University
Drop-seq

Molecular Biology and Genomics – a.a. 2021-2022


Macosko, et al., 2015 – citation 4303
Barcoded
Primer
• Resin beads Bead
• mRNA capture efficiency is quite low
because primers are attached only to the
surface of the bead
• mRNA is captured inside the drop, reverse
transcription > later Droplet
Generation
• cDNA amplification it uses a direct
template-switching protocol

Reaction in
Droplets 0.3h
Reaction after
Demulsification 9h 24
La Sapienza University
10X Genomics (chromium)

Molecular Biology and Genomics – a.a. 2021-2022


Zheng, et al., 2017 – citations 2454
Barcoded
Primer
• Hydrogel beads -> they can be dissolved
Bead
• Reverse transcription directly inside each
drop
• For cDNA amplification it uses a direct
template-switching protocol
• commercial Droplet
Generation

Reaction in
Droplets 1h
Reaction after
Demulsification 7h 25
La Sapienza University
Methods comparative chart

Molecular Biology and Genomics – a.a. 2021-2022


Method Type Cell number RNA full lenght UMI RNA species
Fluidigm C1 in wells low throughput yes no poly-A
SMARTer total
RNA in wells low throughput yes no total
Smart-seq3 in wells low throughput yes yes poly-A
SPLiT-seq in wells high throughput no yes poly-A
InDrop dropplets high throughput no yes poly-A
DropSeq dropplets high throughput no yes poly-A
10X dropplets high throughput no yes poly-A

26
La Sapienza University
Workflow: part 1-Data Pre-processing

Molecular Biology and Genomics – a.a. 2021-2022


READS QUALITY CONTROL

1. the number of counts


per barcode (count
depth)
2. the number of genes
per barcode
3. the fraction of counts
from mitochondrial
genes per barcode

Seurat official web-site 27


La Sapienza University
Workflow: part 1-Data Pre-processing

Molecular Biology and Genomics – a.a. 2021-2022


READS QUALITY
CONTROL

1. the number of counts per


barcode (count depth),
2. the number of genes per
barcode,
3. the fraction of counts from
mitochondrial genes per
barcode
Large image

Practical part 1A
- file: 10X_Seurat.Rmd
Matrix examples

Molecular Biology and Genomics – a.a. 2021-2022


Drop-seq 10x

Very sparce matrixes


30
La Sapienza University
Methods features

Molecular Biology and Genomics – a.a. 2021-2022


• Technical noise: variation given by experimental
randomness

10x Drop-seq inDrop

• Precision: system resolution - nearest correlation • Sensitivity: transcripts detection also for
between one cell the others low level of expression

Zhang, X. et al. Comparative Analysis of Droplet-Based Ultra-High-Throughput Single-Cell RNA-Seq Systems. Mol. Cell 73, 130-142.e5 (2019).
Workflow: part 1. Data normalization

Molecular Biology and Genomics – a.a. 2021-2022


1. Most common : “count per million” CPM . It
assume that all cells initially contain the same
amount of RNA
2. Same assumption: down sampling protocol.
3. Scran’s pooling-based size factor estimation
method: size factors are estimated based on a
linear regression over genes, after cells are
pooled to avoid technical dropout effects
4. Non-linear normalization methods (for more
complex unwanted variation). Many such
methods involve the parametric modelling of
count data (negative binomial or Poisson-
gamma).
We cannot expect that a single normalization method is
appropriate for all types of scRNA-seq data.
Computational Methods for Single-Cell RNA Sequencing
Brian Hie, Joshua Peters, Sarah K. Nyquist, Alex K. Shalek, Bonnie Berger, Bryan D. Bryson
Annual Review of Biomedical Data Science 2020 3:1, 339-364
Log transformation

Molecular Biology and Genomics – a.a. 2021-2022


Three effects:
1. distances between log-transformed expression values represent fold changes
2. mitigating the mean–variance relationship in single-cell data
3. reducing the skewness of the data (approximation to normally distributed)

log transformation of normalized data can introduce spurious differential expression


effects into the data
Gene scaling ??

Molecular Biology and Genomics – a.a. 2021-2022


Seurat tutorial --- Yes
Others no

Should all genes be weighted equally for downstream analysis, or the magnitude of
expression of a gene is an important information?
Large image

Practical part 1B
- file: in_wells_Seurat.Rmd
Dropplet batch effect

Insert or Drag and Drop your Image

Molecular Cell 2019 73130-142.e5DOI: (10.1016/j.molcel.2018.10.020)


Different methods for dataset
integration

Insert or Drag and Drop your Image


Large image

Practical part 2B
- file: in_wells_Seurat.Rmd
Workflow: part 3

Molecular Biology and Genomics – a.a. 2021-2022


Dimensionality reduction

1. To capture the underlying structure in


the data in as few dimensions as
possible
2. Identifying highly variable genes is
primarily used to focus downstream
dimensionality reduction and
clustering

41
La Sapienza University
Workflow: part 3 PCA –

Molecular Biology and Genomics – a.a. 2021-2022


Principal Component Anaysis
Dimensionality reduction

1. To capture the underlying structure in


the data in as few dimensions as
possible
2. PCA is a linear decomposition
3. Identifying highly variable genes is
primarily used to focus downstream
dimensionality reduction and clustering

42
La Sapienza University
Workflow: part 3 PCA –

Molecular Biology and Genomics – a.a. 2021-2022


Principal Component Anaysis
Dimensionality reduction

1. To capture the underlying structure in


the data in as few dimensions as
possible
2. PCA is a linear decomposition
3. Identifying highly variable genes is
primarily used to focus downstream
dimensionality reduction and clustering

44
La Sapienza University
Workflow: part 3 t-SNE -

Molecular Biology and Genomics – a.a. 2021-2022


t-distributed stochastic neighbor
embedding
Dimensionality reduction

1. To capture the underlying structure in


the data in as few dimensions as
possible
2. PCA is a linear decomposition
3. Identifying highly variable genes is
primarily used to focus downstream
dimensionality reduction and clustering
4. Non-linear: richer structural information
to prevent dense overcrowding within a
visualization but may also introduce
unrepresentative distortion

45
La Sapienza University
Workflow: part 3 UMAP -

Molecular Biology and Genomics – a.a. 2021-2022


Uniform Manifold Approximation
and Projection
Dimensionality reduction

1. To capture the underlying structure in


the data in as few dimensions as
possible
2. PCA is a linear decomposition
3. Identifying highly variable genes is
primarily used to focus downstream
dimensionality reduction and clustering
4. Non-linear: richer structural
information to prevent dense
overcrowding within a visualization but
may also introduce unrepresentative
distortion
46
La Sapienza University
Workflow: part 3

Molecular Biology and Genomics – a.a. 2021-2022


Cell clustering, identification and DE

• Grouping cells based on similarity of gene


expression profiles
• Clustering techniques can be split into
several general categories
• Fundamental before cell identification
• Different clustering methods lead to
different cell identification (especially
important if we want rare cells)
• Cluster assignment with marker genes
• After cluster identification => differential
expression
47
La Sapienza University
Large image

Practical part 3A
- file: 10X_Seurat.Rmd
Workflow: part 4

Molecular Biology and Genomics – a.a. 2021-2022


dataset exploration

Insert or Drag and Drop your Image

49
La Sapienza University
Large image

Practical part 4A
- file: 10x_Seurat.Rmd
Cellular trajectories

Molecular Biology and Genomics – a.a. 2021-2022


Pseudotime approches
• It aim to arrange cells along
an axis of variation that
depends on both cells and
genes
• often applied to cells
undergoing differentiation
• cell orderings can be
meaningful in many
experiments (as spatial
gradients or response to
external stimuli)
• Many different algorithms
Saelens, W., Cannoodt, R., Todorov, H. et al. A comparison of single-cell trajectory inference methods. Nat
Biotechnol 37, 547–554 (2019). https://doi.org/10.1038/s41587-019-0071-9
La Sapienza University 53
Cellular trajectories

Molecular Biology and Genomics – a.a. 2021-2022


One of the most easy and
common: Monocle

54
La Sapienza University
Thank Silvia Giulia Galfre

You silvia.galfre@uniroma1.it

You might also like