You are on page 1of 11


CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is
one type of DNA sequences present in almost all microorganisms (Bacteria and
Archaea) which consists of short repetitions, while each repetitions is followed by
some spacer DNA - some non-coding random sequence which separates gene,
repeats, etc. - and its pattern is quite palindromic. This kind of sequence can
generally be found in clusters, playing an important role in prokaryotic defense
mechanism against foreign genetic elements such as phage or plasmids (Sorek et
al. 2008).

Brief history
CRISPR was first observed by Ishino and colleagues (1987) who found 14
repeats consisting 29 bp each that are linked by some DNA spacer (Ishino et al.
1987). Further analysis by Mojica revealed that CRISPR is found in nearly 40%
of bacterial genomes and 90% of archaeal genomes (Mojica et al. 2000). Jansen et
al. identified another 4 set of sequences that are present adjacent to the CRISPR
array, called CAS (CRISPR associated sequence genes (Jansen et al. 2002). Haft
et al. identified 25-45 additional Cas genes located near the arrays (Haft et al.

Structural features

Figure 1 Typical structure of CRISPR locus (Sorek et al. 2008)

CRISPR system consists of CAS genes and CRISPR array. The CRISPR
array is made of repeats separated by DNA spacer. Many genomes have only one
locus of CRISPR, some others could have as many as 18 loci. Although this
system varies greatly among microbes, some characteristics are conserved and
follow some similar patterns (Sorek et al. 2008).
Repeats, usually 24-47 bp in length, are generally diverse among species.
However, they can be divided into 12 major groups. Some major groups are
palindromic (5-7bp), indicating that these repeats could form a lollipop structure
as its secondary structure when transcribed into a long transcript of mRNA. The
other groups whose repeats are not palindromic do not form any secondary
structure. Single CRISPR array often has identical repeats, both sequence and
size. Despite being different in type, many repeats seem to have some conserved
sequence in the 3 end of their sequences, usually GAAA(G/C). Both features
seem to play a role as a binding site for Cas protein (Sorek et al. 2008).
Spacers are generally unique in each array. They are possibly acquired
from phage, as some searches found that spacers sequences and phages
sequences are often match each other with a high similarity. Spacers are evenly
distributed in phages genome. They can be derived from sense or anti-sense
strand, even though there is some kind of preference of which strand those spacers
are usually derived from. Some reports showed a short motif is present in the
phage genomes 1-2 nt downstream the complementary sequences of spacers. This
motif, called PAM (Protospacers Adjacent Motif) seem to act as recognition site
for the CRISPR system. For example, CRISPR systems found in S. thermophilus,
CRISPR1 and CRISPR3, have respectively AGAA and GGNG as their PAM
(Sorek et al. 2008).
Leader sequence has length up to 550 bp and usually rich of A-T bases.
This sequence connects Cas genes and CRISPR array, located directly upstream
the CRISPR array. Leader has no open reading frame and varies among bacteria,
except if there are more than one set of CRISPR loci in one chromosome. The
leaders are presumed to be the recognition site for the addition of a new set of
repeat and spacer since some observations suggest that new pair of repeat and
spacer is almost always be added directly downstream the leader sequence and
upstream the previous pair. The leaders probably also act as promoter for the
arrays transcription (Sorek et al. 2008).
Cas genes are characterized as large families of protein associated with
CRISPR system. CRISPR systems are divided into 7-8 subtypes, each has 2-6
different subtype-specific Cas genes. There are 6 type of Cas genes (core Cas
genes, Cas1 Cas6) that are associated in multiple subtypes. Among the core Cas
genes, Cas1 has been identified as the universal marker of CRISPR system as it is
always present in all CRISPR subtypes. Cas proteins have six functional domain
identified so far: endonuclease, exonuclease, helicase, DNA binding, RNA
binding and transcription regulator domain (Sorek et al. 2008).

CRISPR/Cas system plays a significant role in the defense mechanism
against phage infection. Barrangou et al. demonstrated that some small group of
bacteria infected with phage survived with some additional sets of repeat-spacer
sequence present in its CRISPR1 locus. These sets are identical with the genome
of phages, indicating that they are acquired from phage itself. When those
sequences mutated, all new mutants seemed to lose their resistance against phage
infection. Furthermore, when these new sets are transformed into phage-sensitive
bacteria, the new mutants become resistant of phage infection. Deletion of the sets
transformed also makes mutants lose their resistance (Sorek et al. 2008).
Interestingly, phages are able to adapt the situation by changing its PAM
sequence to withstand resistance from bacteria. This adaptation showed that PAM
sequence has a significant role in CRISPR/Cas recognition system. To study the
mechanism, Barrangou et al. also inactivated two subtype-specific cas genes.
Inactivation of cas gene which has endonuclease motif leads to inability to resist
infection. Inactivation of the other cas gene leads to inability to acquire new set of
repeat-spacer sequence (Sorek et al. 2008).

CRISPR/Cas mechanism is somewhat similar to siRNA/RNAi system found
in eukaryotes. In eukaryotes, dsRNA from viruses are sliced into siRNA by Dicer.
siRNA then converted into ssRNA by RNA induced silencing protein complex
(RISC). siRNA and RISC form complex that will recognize other viral genomes,
leading to its degradation by Slicer protein (Sorek et al. 2008).
CRISPR/Cas system works analogous to siRNA system. There are three
major stages in CRISPR/Cas mechanism of defense. Those stages are adaptation,
expression (biogenesis) and interference (Sorek et al. 2008).
Adaptation stage is where new spacer is recognized from the foreign genetic
material. Those sequences, called protospacers, then processed into small spacers
and integrated into CRISPR array right downstream of the leader sequence. The
repeat then duplicates and integrated upstream the new spacer, making a repeat-
spacer pattern in the array (Makarova et al. 2011).
Expression (biogenesis) stage begins with transcription of CRISPR array
into a long mRNA transcript. The mRNA then cut in the middle of the repeats,
resulting in short RNA with one spacer and two half repeats in both ends. This
short RNA binds Cas protein, forming a complex of functional Cas protein
(Makarova et al. 2011).
Interference stage is when the complex then recognizes and binds foreign
sequence as its target, leading to degradation by another Cas protein with RecB-
like activity. The detailed mechanisms of CRISPR/Cas interference differ
according to their types (Makarova et al. 2011).

Figure 2 Typical CRISPR/Cas mechanism (Sorek et al. 2008).

CRISPR/Cas systems are classified into three major types, each type has
several subtypes. Those types and subtypes differ in terms of mechanism and cas
genes associated.

Figure 3 Detailed CRISPR/Cas mechanism (Makarova et al. 2011).

Type I CRISPR-associated system

Type I CRISPR/Cas system perhaps is the most widely distributed among
archaea and bacteria. This type has six distinct subtypes (I-A I-F). All subtypes
encodes cas3 gene whose protein has N-terminal HD phosphohydrolase domain
and C-terminal DExH helicase domain. In some subtypes, genes which encode
nuclease and helicase domains are separated, but still function together to cleave
and unwind target dsDNA. Cas3, however, is unable to recognize the target DNA,
therefore the presence of other protein complex is necessary in this system.
Each subtype of type I CRISPR/Cas system has its own protein complex
called Cascade (Cas complex for antiviral defense) acting as surveillance complex
which recognize foreign DNA. First Cascade protein identified was from E. coli
K12 (Type I-E). This protein is a ribonucleoprotein, weighs 405 kDa and consists
of 11 subunits of 5 functionally essential Cas protein. One of the subunits
identified is Cas6e, a CRISPR-specific endoribonuclease. Cas6e cleaves pre-
crRNA into a mature crRNA and stabilizes other subunits to form a functional
Cascade complex. The structure of Cascade complex is sophistically organized
that crRNA is protected from degradation but still able to pair complementarily
with target sequence (Sorek et al. 2011).

Figure 4 Type I CRISPR/Cas mechanism (Sorek et al. 2011).

Type I mechanism is pretty simple. In the adaptation stage, PAM sequence

upstream the protospacers sequence is recognized and cleaved. Cas1/Cas2
incorporates the protospacers into the CRISPR array directly upstream the latest
set. The repeat is then duplicated and integrated upstream the new spacer,
maintaining the repeat-spacer pattern. The array is then transcribed into a long
RNA called pre-crRNA in the expression stage. The Cascade complex binds the
pre-crRNA in the spacer region flanked by repeats, where the Cas6e subunit (or
Cas6f in type I-F system) cleaves the pre-crRNA into small mature crRNA. The 3
repeat often form lollipop structure. Cascade complex, together with crRNA
bound within, binds the target DNA by recognizing its PAM sequence. Another
Cas protein, Cas3, then cleaves the target DNA, leading to its degradation (Sorek
et al. 2011).
Type II CRISPR-Associated System
This system is only found in bacteria by far, and consists of four cas genes :
cas9, cas1, cas2 and csn2 (II-A)/ cas4 (II-B). This system is unique since there is
a new type of RNA involved in the system, tracrRNA (trans encoded RNA).
tracrRNA is a small RNA encoded upstream and in the opposite strand of the
CRISPR/Cas locus. Its function is to recognize the repeat sequence in the pre-
crRNA, directing the housekeeping RNAse III to cleave the target spacer repeat.
The other remarkable protein is Cas9, which has several functions and plays an
important role in CRISPR/Cas system. Cas9 binds the pre-crRNA where
tracrRNA hybridizes, and without its presence, crRNA biogenesis is inhibited.
Cas9 also acts as a nuclease for the target DNA, guided by tracrRNA-crRNA
hybrid. This protein has two nuclease domain, HNH nuclease domain which
cleave DNA strand complementary to the crRNA guide, and RuvC-like domain
which cleave the non-complementary strand. The recognition of the target
sequence also involves PAM sequences (Sorek et al. 2011).

Figure 5 Type II CRISPR/Cas mechanism (Sorek et al. 2011).

Type-III CRISPR/Cas Mechanism

This type is the most commonly found in archaea. This type is
subcategorized into two subtypes, III-A and III-B, which targets different
substrates. III-A targets DNA, and III-B targets RNA. The acquisition of new
spacer is not dependent on PAM sequence. Both types encode Cas6 and Cas10.
Cas6 apparently acts as endoribonuclease, cleaving pre-crRNA into crRNA.
However, the crRNA does not form complex with Cas6. crRNA seem to be
transferred into the other complex, Csm (III-A) or Cmr (III-B), which in the latter
system the 3 end of the crRNA is further cleaved. Cas10 is probably involved in
the interference stage (Sorek et al. 2011).

Figure 6 Type III CRISPR/Cas mechanism (Sorek et al. 2011).

Since the type II CRISPR/Cas system has the simplest mechanism, requiring
only Cas9 as the sole protein to work with, CRISPR/Cas9 system is the most
studied and applied by researchers for various purposes.

Genome editing
CRISPR/Cas9 provides a great tool to manipulate specific sequence in target
genomes. By introducing the CRISPR/Cas9 system into target cells, researchers
are able to induce specific change in the genome sequence. Siksnys et al.
demonstrated that CRISPR/Cas9 is transferrable from S. thermophilus into E. coli.
In the demonstration, CRISPR/Cas9 system transferred into recipient still retain
the ability to interfere host genome although it is originated from different
bacteria. Further research reported that purified Cas9 enzyme guided by crRNA is
able to cleave target DNA in vitro. To improve efficiency, crRNA and tracrRNA
can be fused into a single guide RNA (sgRNA). The sgRNA is able to form
complexes with Cas9 and become functional. In order to manipulate target
genome, several components are needed: Cas9, sgRNA, and editing template.
Editing template is needed to induce homologous repair in the target genome (Hsu
et al. 2014).
Genome editing using this technique has several advantages. The alteration
is highly specific and precise, making it very useful for drug development and
research in medicine. Furthermore, alteration of more than single target is
practically possible. By incorporating more than one guide into the system,
simultaneous changes could be done efficiently with a reliable result. Mutagenesis
generated by CRISPR/Cas9 system is also stable and heritable, demonstrated by
Bassett et al. when editing Drosophila genome using improved RNA injection-
based CRISPR/Cas9 system. This trait is useful to improve the quality of
agricultural crop such as rice by integrating valuable traits into its genome (Zhang
et al. 2014).

Transcription regulation
CRISPR/Cas9 could provide a good tool to study transcription regulation by
altering the functional site related to transcription. However, since the genome
itself is altered, this process is irreversible. One possible solution to this problem
is to modify the Cas9, inactivating its ability to cleave DNA, so Cas9 is only able
to recognize and bind target DNA without cleaving the target. This modification
makes Cas9 repress the transcription and the gene expression altogether in model
bacteria, but still unable to show similar result in mammalian cells. This improved
system makes it possible to manipulate gene expression while maintaining
original genome sequence (Zhang et al. 2014).

Although CRISPR/Cas9 system is powerful for efficient and precise
genome editing, there are some challenges to improve its usability. The system
presents a rather high risk of off-target mutations that in some cases could be
dangerous. Organisms with a large genome size tends to have multiple
identical/homologous DNA sequences, so it is possible for Cas9 to alter undesired
sequences. To maximize its potential, it is highly recommended to choose unique
sequences that have least off-target sites. One possible way to reduce off-sites
cleavage is by converting Cas9 into nickase (Zhang et al. 2014).
CRISPR/Cas9 system also depends heavily on PAM sequences to recognize
the desired target. This dependence constrains the usability of CRISPR/Cas9 since
the target must have PAM sequence directly downstream of the target. PAM
sequences vary among species where Cas9 is isolated from, longer PAM sequence
required for CRISPR/Cas9 means less target sequences available, but more
specific than the shorter ones (Zhang et al. 2014).

Sorek R, Kunin V, Hugenholtz. 2008. CRISPR a widespread system that
provides acquired resistance against phages in bacteria and archaea.
Nature: 181-186.
Makarova KS, et al. 2011. Evolution and classification of the CRISPRCas
systems. Nature Reviews Microbiology, 9(6), 467-477.
Sorek R, Lawrence CM, Wiedenheft B. 2013. CRISPR-mediated adaptive
immune system in bacteria and archaea. Annu. Rev. Biochem 82:11.1-
Hsu PD, Lander ES, Zhang F. 2014. Development and applications of CRISPR-
Cas9 for genome engineering. Cell: 157.
Zhang F, Wen Y, Guo X. 2014. CRISPR/Cas9 for genome editing: progress,
implications and challenges. Human Molecular Genetics: R40-R46.
Ishino Y, Shinagawa H, Makino K, Amemura M, Nakata A. 1987. Nucleotide
sequence of the iap gene, responsible for alkaline phosphatase isozyme
conversion in Escherichia coli, and identification of the gene product. J.
Bacteriol. 169: 54295433.
Mojica FJ, Diez-Villasenor C, Soria E, Juez G. 2000. Biological significance of a
family of regularly spaced repeats in the genomes of Archaea, Bacteria and
mitochondria. Mol. Microbiol 36: 244246.
Jansen R, Embden JD, Gaastra W, Schouls LM. 2002. Identification of genes that
are associated with DNA repeats in prokaryotes. Mol. Microbiol. 43:
Haft DH, Selengut J, Mongodin EF, Nelson KE. 2005. A guild of 45 CRISPR-
associated (Cas) protein families and multiple CRISPR/Cas subtypes exist
in prokaryotic genomes. PLoS Comput. Biol. 1: e60.
Barrangou R, et al. 2007. CRISPR provides acquired resistance against viruses in
prokaryotes. Science 315: 17091712.