You are on page 1of 14

Identification of YMZpB by shotgun sequencing

Ehsan Khodapanahi
Feb 2015
Introduction
Much of the progress that has been made in molecular biology in recent years, has been a
large part due to the advanced and accessible technology to study DNA. The completion of
human genome project (HGP) in 2003 and similar projects on other organisms thereafter was a
breakthrough in the field, providing a wealth of information for gene studies. Today, and with
the help of available tools, researcher can identify a piece of unknown DNA relatively
inexpensively, and predict the function of the gene based on its similarity to genes with known
function.

In the first module, a variety of laboratory and bioinformatic tools were used to sequence
an unknown piece of genomic DNA. DNA comparative analysis was conducted to compare and
characterize the unknown DNA against gene databases. Sequencing method used in this module
was shotgun sequencing, which involves breaking of unknown piece of DNA into numerous
segments, which will be used to determine the nucleotide sequence [1, 2]. A 5 kb unknown piece
of genomic DNA (YMZpB) was digested using two restriction enzymes producing fragments of
various sizes. The fragments were then randomly cloned into linearized pBluescript-SK vector
(pBS-SK(+); see Appendix 1 for vector map) and transformed into E.coli for propagation. The
inserts were amplified by PCR using E.coli colonies harbouring pBS-SK-YMZpB fragments as
templates and pBS-SK-specific primers. After confirming the presence of inserts inside the
vector by electrophoresis, samples were further processed to clean up left-over primers and
deactivate the remaining nucleotides. This step in necessary prior to set up sequencing reaction,
otherwise these elements interfere with sequencing process producing unreliable data. Next, we
set sequencing reactions using clean PCR products. The obtained sequences were then re-
assembled using computer algorithms. Since the sequenced PCR products are originated from a
random insert library make from the unknown piece of genomic DNA, multiple sequences are
expected to overlap in such a way to compose the 5 kb genomic DNA back together.
Comparative analysis using Basic Local Alignment Search Too (BLAST) and phylogeny was
used to further investigate the function of the encoded protein.

There were 9 putative ORFs identified to be encoded by YMZpB in Pseudomonas


syringe, which are involved in pathogen-related type III secretion system in the bacterium [3].

Materials and methods


DNA materials
This study aimed to identify and characterize a 5 kb unknown piece of genomic DNA, YMZpB.
The random insert library was made by cloning fragments produced by enzymatic double
digestion of YMZpB into linearized bacterial expression vector, pBluescript-SK (pBS-SK (+)).
pBS-SK(+) is designed in such a way that insertion at multiple cloning site (MCS) will disrupt
intact LacZ, disabling bacteria cells to process X-Gal into blue precipitate [4].

Enzymatic digestion
100ng (2μl) of pBS-SK was linearized in 10μl volume reaction in 200μl thin-wall 8-strip PCR
tubes containing 1μl of 10X FastDigest buffer, 0.5μl FastDigest SmaI restriction enzyme and
6.5μl ddH2O. The solution was then gently mixed and incubated in 37° C heating block for 30
minutes. Restriction enzyme used for this reaction, SmaI, produces blunt-ended fragments,
which are required to be dephosphorylated to prevent self-ligation.
100ng (5.4μl) YMZpB DNA wad double digested in 20μl volume reaction in the 200μl PCR
tubes containing 2μl of 10X FastDigest buffer, 1μl FastDigest EheI, 1μl FastDigest MscI
restriction enzymes and 10.6μl ddH2O. The solution was then gently mixed and incubated in 37°
C heating block for 30 minutes. Both restriction enzymes used for this reaction, EheI and MscI,
produce blunt-ended fragments containing phosphate groups necessary for cloning into
linearized, dephosphorylated.

Heat Inactivation
Restriction enzymes were heat inactivated after digestion by transferring the tubes to the 65°C
heating block for 5 minutes. Tubes were then put in ice immediately afterward. Phosphatase
reaction (see Dephosphorylation) was inactivated 10 minutes at 65° C. Ligation reactions were
similarly inactivated at 65 °C for 10 minutes. Enzymatic PCR cleaning reaction was deactivated
at 85°C for 15 minutes.

Dephosphorylation
0.5μl Shrimp Alkaline Phosphatase (SAP) was added to the solution containing linearized pBS-
SK digestion enzyme inactivation. The solution was gently mixed and incubated at 37°C for 30
minutes and deactivated as previously explained.

Cloning
EheI/MscI digested DNA fragments were randomly cloned into SmaI digested,
dephosphorylated pBS-SK vector in the 200μl PCR tubes containing 20ng of linearized vector,
75ng of YMZpB digested insert fragments (1:3 ratio), 1X FastDigest buffer, 500nM ATP, 0.5 U
T4 DNA Ligase and ddH2O to bring the total reaction volume to 10μl. Negative control for the
background contamination was set by excluding insert fragments in the reaction. The reactions
were then incubated at room temperature for 20 minutes, and proceeded to the transformation
reactions immediately after heat inactivation.

Bacterial Transformation
100μl of chemically competent, frozen DH5α E. coli cells were thawed in ice for about 2-3
minutes. Transformation reactions were prepared by adding 10μl of the ligation mixture with
digested DNA inserts, or 10μl of negative control mixture, or 0.5μl undigested pBS-SK as
positive control into each tube of competent cells. Tubes were then incubated 20 minutes in ice,
followed by heat shock at 42°C for 50 seconds. Tubes were put back in ice for another 2 minutes.
One ml of SOC medium was added to the mixture and tubes were incubated at 37°C incubator
shaking at 250 RPM for 30 minutes.

Growth condition
100μl of transformation solution was plated on Luria Broth (LB) amp R (100μg/ml) agar plates,
each spiked with 100μl from master mix containing 160μl of a 20mg/ml X-gal solution, 8μl of a
1M IPTG solution, and 232μl ddH2O. The transformation plates were incubated overnight at
37°C. For transformation negative control, all of the bacterial culture was spread on selective
medium. For the positive control transformation, 50μl of the culture was spread on selective
medium.

Colony-PCR
Colony-PCR was carried on to amplify the cloned fragments using pBS-SK-specific T3, (5’-
AATTAACCCTCACTAAAGGGAACAAAAG-3’), and T7 primers (5’-
GTAATACGACTCACTATAGGGCGAATTG-3’). Seven white colonies were picked up and
mixed with PCR solution containing 1X PCR buffer, 1.5mM MgCl 2, 200μM dNTPs, 0.1 µM of
each T3 and T7 primers, 1.25U Taq DNA Polymerase and ddH 2O to total volume of 25μl. PCR
reaction proceeded using thermal cycler machine (Hybaid PXE 0.2 BioRad DNA Engin) and
heating/cooling cycles as the following: 1x 94°C initial denaturation for 4 minutes, 35x 94°C
denaturation for 30seconds, 56°C annealing for 30 seconds and 72°C extension for 1 minute, and
1x 72°C final extension for 5 minutes.
Gel electrophoresis
The size of PCR amplicons was confirmed by gel electrophoresis. Five µl of each PCR product
and 1X loading dye was loaded on 1% (wt/vol) TBE agarose gel and run at 100V for 30 minutes.
5μl of 1 Kb DNA ladders (Fermentas life science) was loaded into the first well.

Enzymatic PCR product cleaning


The selected PCR products were each treated with 0.2µl of 20U/µl exonuclease and 0.2µl of
10U/µl calf intestine alkaline Phosphatase (CIAP), incubated at 37°C for 30 minutes and
deactivated as previously explained.

Sequencing PCR
3μl of each cleaned PCR reaction was mixed with 1.75μl of 5X sequencing buffer, 0.5μl of
10μM T7 primer, 0.5μl of BigDyev3.1 sequencing mix, and ddH 2O to bring the total reaction
volume to 10μl. The sequencing PCR reactions were programed on the thermal cycle machine
((Hybaid PXE 0.2 BioRad DNA Engin) as, 1x 96°C initial denaturation for 1 minute, 45x 96°C
denaturation for 10 seconds, 50°C annealing for 5 seconds and 60°C extension for 4 minutes, and
1x 60°C final extension for 4 minutes. The reactions were run on ABI 3730 Genomic Analyser
in order to determine nucleotide sequence.

Sequence annotation
Obtained sequences were edited for by excluding the primer regions at the beginning and
unevenly distributed peaks and overlapping peaks at the end of sequences, respectively. The
overlapping contigs from edited sequences along with provided fragments were used to assemble
one contiguous of ~5Kb consensus sequence by Sequencher® software. Potential open reading
frames (ORFs) were identified using NCBI ORF Finder (National Center for Biotechnology
information, http://www.ncbi.nlm.nih.gov/projects/gorfl) and Gene Mark
(http://exon.gatech.edu/GeneMark/). Putative ORFs were compared with the NCBI databases
using Blastp and Blastn respectively.

Phylogenetic analysis
A putative ORF, encoding the Yscj/Hrcj protein was selected for phylogenetic analysis. The 10
most similar proteins to Yscj/Hrcj in different organisms were identified using BLAST.
Phylogenetic tree was constructed using MEGA 6.06 using the neighbor joining method with a
bootstrap of 500 to generate the tree.
Results
1-Shotgun cloning
1.1- E. coli transformation
There were 1 blue and 11 white colonies grown on two transformation plates (Table 1). White
colonies are originated from E. coli trasnformants carrying pBS-SK vector with YMZpB
fragment insert. Negative control transformed with digested pBS-SK only, and positive control
transformed with undigested pBS-SK only produced 16 and around ~800 colonies, respectively.

Table 1: Trasnformants quantification. White colonies represent successful fragment cloning


into pBS-SK. Blue colonies represent the presence of empty pBS-SK (original) in trasnformants.

Ligation of Negative ligation Positive transformation


DNA fragments control control
No. white colonies 11 0 0
No. blue colonies 1 16 ~800
Percentage of blue
9 N/A N/A
colonies/total.

1.2- Colony PCR and Gel Electrophoresis


Eight PCR reactions were set up using 7 randomly selected white colonies, each served as a
template in one PCR tube, and the remaining tube as negative control with no template (Figure
1).

Figure 1: Gel electrophoresis. DNA fragments produced by PCR using 7 independent colonies
as template. Reaction 1, 3, 6 and 7 contained a band of approximately 2.2 kb while reactions 2, 4
and 5 predominantly produced a band of 3.5 kb.
2-Sequence of a 5kb genomic region
The sequence of YMZpB was assembled using a total of 69 DNA contigs edited by
Sequencher® software using 90% minimum match and 20 nucleotides for minimum overlap
composing 5.4 kb long stretch of DNA (Figure 2).

Figure 2: Generated 5.4 Kb DNA stretch using 69 overlapping contigs.


3-Genomic organization of the 5kb region
To determine putative ORFs, a consensus sequence of the 5kb region of DNA was blasted at the
protein level in the NCBI ORF Finder. Forty-four potential ORFs were found, nine of which
were determined to be significant based on their E-values and bitscores (Table 2.A). Next,
GeneMark revealed nine predicted ORFs. The putative ORFs were blasted at the nucleotide level
to analyse the ORFs annotation (Table 2.B). The genes annotations, order and directions were
used to assemble a physical gene map presented in Figure 3.

Table 2: Putative ORFs using YMZ consensus sequence. A.NCBI ORF Finder, and B.
GeneMark. Hits were selected based on E-values and bitscores.
A:ORF Finder
ORF Accession Strand/ aa
Range (bp) bp size Significant matches E-value Bitscore
number number Frame size
WP_00724847 type III helper protein HrpZ1 [Pseudomonas syringae
1 757-1764 1+ 1008 335 0 649
2.1 group genomosp.
WP_00724847 type III secretion protein HrcJ [Pseudomonas syringae
2 2211-2984 3+ 774 257 0 519
4.1 group genomosp.
WP_00724847 type III secretion protein HrpE [Pseudomonas syringae
3 3554-4135 2+ 582 193 2.00E-130 376
6.1 group genomosp.
4 AFH66566.1 2981-3517 2+ 537 178 HrpD [Pseudomonas cannabina] 3.00E-123 357
5 AFH66569.1 4421-4861 2+ 441 146 HrpG [Pseudomonas cannabina] 1.00E-100 296
WP_00724847 type III helper protein HrpA2 [Pseudomonas syringae
6 389-703 2+ 315 104 4.00E-67 208
1.1 group genomosp.
hrp regulatory protein HrpS [Pseudomonas syringae pv.
7 BAD20870.1 35-295 2+ 261 86 maculicola] 5.00E-55 179

type III secretion protein HrpF [Pseudomonas syringae


WP_00724847
8 4219-4443 1+ 225 74 group genomosp. 4.00E-44 147
7.1

WP_00576000
9 2011-2166 1+ 156 51 type III secretion protein [Pseudomonas amygdali] 1.00E-25 100
8.1

B:GenMark
1 AB795317.1 2-295 + 294 97 hrp regulatory protein HrpS 4.00E-151 538
2 AB795319.1 404-703 + 300 99 hrp pili protein HrpA 2.00E-154 555
3 AB795318.1 757-1764 + 1008 335 harpin protein HrpZ 0 1862
4 JQ517282.1 1792-2166 + 375 124 hrp associated protein HrpB 0 693
5 JQ517282.1 2175-2984 + 810 269 type III secretion protein HrcJ 0 1496
6 JQ517282.1 3116-3517 + 402 133 type III secretion protein HrpD 0 739
type III secretion protein HrpE
7 JQ517282.1 3554-4135 + 582 193 0 1075
type III secretion protein HrpF
8 JQ517282.1 4219-4443 + 225 74 7.00E-113 416
9 JQ517282.1 4430-4861 + 432 143 type III secretion protein HrpG 0 798

Figure 3. Physical map of genes encoded by YMZpB. Length of each arrow is proportional to
the size of the gene.

4-Phylogenetic analysis
The second putative ORF from NCBI ORF finder was selected for Phylogenetic analysis. The
gene encodes for a type III secretion system protein HrcJ in Pseudomonas syringe, and contains
a putative conserved domain of fliF. The FliF protein ring is thought to be part of the export
apparatus for flageller proteins, based on the similarity to YscJ proteins [5, 6]. As shown in
Figure 3, the phylogenetic tree suggests the HrcJ protein has close orthologs in Serratia
marcescens (strain FGI94) and Pseudomonas parafulva (strain CRS01-1) as well as more distant
orthologs in, Pseudomonas fluorescens (strain Pf29Arp) and Pseudomonas brassicacearum
(strain NFM421).

Figure 4. Generated phylogenetic tree for type III secretion protein HrcJ in Pseudomonas
syringe.

Discussion
In this study, an unknown piece of DNA, YMZpB, was analysed at the nucleotide level, and
found to encode for a set of genes involved in the type III secretion system in Pseudomonas
syringe. The module initiated by digesting YMZpB into different sized fragments, and random
cloning of the fragments into bacterial expression vector, pBS-SK using conventional ligation
method. The destination vector carrying YMZpB fragments were then transformed into E coli.
competent cells for propagation, which resulted in 11 white and one blue colonies. The blue
colony on ligated vector is indicating that an empty pBS-SK (without YMZpB fragment) has
been transformed into E coli. competent cells. This might be due an incomplete SmaI digestion
which led to the presence of the uncut pBS-SK in the transformation solution. Another
possibility is incomplete dephosphorylation of digested pBS-SK, which made it possible for the
two blunt ends to self-ligate and form the replicable structure of original pBS-SK. The digestion
and dephosphorylation reaction time can be extended to optimize the reaction efficiency for the
future experiments. The absence of white colony on positive and negative controls was as
expected (see methodology). Growing 16 blue colonies on negative control plates also refers to
the presence of original pBS-SK in the linearized, dephosphorylated vector solution, which has
been previously discussed. In contrast with a single blue colony grown on two experimental
plates, 16 blue colonies grew on the negative control plate, which was due to the volume of cell
culture spread on the plate (more than 1 mlfor negative control vs total of 200 μL for bothe
experimental plates). Seven white colonies were selected on ligation plates and served as
template for individual PCR reactions. Although a greater variety of YMZpB fragments was
expected to be obtained by randomly selecting 7 colonies, they predominantly reflect the present
of only two fragment sizes, ~2.4 and ~3.5 kb. This can be due to a different combination of
successful digestion by the two enzymes since there are multiple recognition sites in YMZpB for
each enzyme used. Unsuccessful digestion at some locations can occur as a result of DNA
tertiary structure making the recognition sites inaccessible by the enzyme, or in the case of
double digestion, close proximity of two restriction sites that can inhibit the function of either or
both enzymes. This led to the presence of two fragments for which the estimated size was not
among theoretically expected sizes based on YMZpB map. A possibility however, is EheI
functioning to digest at its first (596) and last recognition site (4039), producing 3.4 kb size band.
For the 2.4 kb band to be produced, EheI might have only functioned to digest at 1048 and 3424
locations instead. Adding to this complex is the different distribution of ladder bands compared
to the legend provided by manufacturer, making it difficult to accurately estimate the band size.
We have set up 7 sequencing reactions, from which we have obtained partially reliable reading
for the first 4 sample. One reason for this can be the presence of contamination such as primers
and nucleotides in the reactions due to an inefficient PCR clean-up reaction. For the future
experiments, using commercial PCR cleaning kits, or conventional DNA precipitation methods,
such as ethanol or PEG precipitation is recommended to isolate desired PCR product only.
Another possibility is the presence of non-specific band in the PCR reaction that might have
interfered with the sequencing of the main product. To solve this issue, we could have optimized
the annealing temperature of PCR reactions, or gel purify the main PCR band to make sure there
is no other DNA materials that can interfere with sequencing reactions other than the main PCR
product.

Sixty-nine overlapping contigs generated from sequenced YMZpB fragments were used to
assemble the full length sequence of YMZpB. The advantage of using overlapping contigs is to
have multiple readings for nucleotides that will reduce errors and strengthen the consensus
sequence data. A contiguous piece of DNA of ~5.4 kb was generated by Sequencher for the
consensus sequence YMZpB. The size of YMZpB consensus sequence differed from the
expected size of ~4.9 kb by about 10 %. However, the difference was not considered significant
as instructed. To reduce the variation between the two sizes, one possibility is to manipulate the
assembly parameters. For instance, increasing the minimum match from 90% to 95% or greater
values, would probably remove the flanking contigs with less similarity to consensus sequence at
either end of contiguous YMZpB.

Putative ORFs found in NCBI ORF Finder and GeneMark were analysed to investigate
associated genes with each ORF. Nine putative ORFs was found by GenMark, and this was in
agreement with 9 selected ORFs from NCBI ORF Finder for which the E-values and bitscores of
best hits were significantly high when blasted. The 9 putative ORFs from both databases
consistently refer to the same location in YMZpB consensus sequence, and identical set of genes
when blasted at the protein and nucleotide levels (Table 2), which increases the likelihood of
accurate predicted ORFs. The consensual putative ORFs and their location was used to generate
physical map presented in Figure 3. Based on this study, YMZpB encodes for an operon
composed of set of genes involved in the type III secretion in Pseudomonas syringe, which
function to suppress host plants immune system [3]. The function of this operon highlight its
significance for pathogenicity and inducing host response by the bacterium; therefore, it is
reasonable to expect relative species encoding for orthologs of genes involved in the type III
secretion in Pseudomonas syringe. In an attempt to test this hypothesis, HrcJ (YscJ/fliF
homolog) nucleotide sequence and its top 10 hits from BLAST were used to generate a
phylogenetic tree and investigate the evolutionary history of the protein. The phylogenetic tree
(Figure 4) clustered relative species into two groups of paralogs, and HrcJ in Pseudomonas
syringe clustered with 4 HrcJ paralog in different species. This in fact supports the hypothesis on
conserved function of HrcJ in different species.
Appendix 1. Vector map pBluescript SK(+) used for bacterial expression. The vector
contain compatible sequence for T7 and T3 common primers. E coli. growth is under Amp
selection. The multiple cloning site (MCS) has been designed in such a way that an insertion will
disrupt β-galactosidase expression, resulting in growing of white colonies by transformants.
Source: "www.snapgene.com/resources"
References:

1. Anderson, S., Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids
Research, 1981. 9(13): p. 3015-3027.
2. Staden, R., A strategy of DNA sequencing employing computer programs. Nucleic acids research,
1979. 6(7): p. 2601-2610.
3. Hueck, C.J., Type III protein secretion systems in bacterial pathogens of animals and plants.
Microbiology and molecular biology reviews, 1998. 62(2): p. 379-433.
4. Horwitz, J.P., et al., Substrates for Cytochemical Demonstration of Enzyme Activity. I. Some
Substituted 3-Indolyl-β-D-glycopyranosides1a. Journal of medicinal chemistry, 1964. 7(4): p. 574-
575.
5. Allaoui, A., P. Sansonetti, and C. Parsot, MxiJ, a lipoprotein involved in secretion of Shigella Ipa
invasins, is homologous to YscJ, a secretion factor of the Yersinia Yop proteins. Journal of
bacteriology, 1992. 174(23): p. 7661-7669.
6. Suzuki, H., et al., A structural feature in the central channel of the bacterial flagellar FliF ring
complex is implicated in type III protein export. Journal of structural biology, 1998. 124(2): p.
104-114.

You might also like