You are on page 1of 13

Identification of YMZpB by shotgun sequencing

Ehsan Khodapanahi
Feb 2015
Introduction
Much of the progress that has been made in molecular biology in recent years, has been a
large part due to the advanced and accessible technology to study DNA. The completion of human
genome project (HGP) in 2003 and similar projects on other organisms thereafter was a
breakthrough in the field, providing a wealth of information for gene studies. Today, and with the
help of available tools, researcher can identify a piece of unknown DNA relatively inexpensively,
and predict the function of the gene based on its similarity to genes with known function.

In the first module, a variety of laboratory and bioinformatic tools were used to sequence
an unknown piece of genomic DNA. DNA comparative analysis was conducted to compare and
characterize the unknown DNA against gene databases. Sequencing method used in this module
was shotgun sequencing, which involves breaking of unknown piece of DNA into numerous
segments, which will be used to determine the nucleotide sequence [1, 2]. A 5 kb unknown piece
of genomic DNA (YMZpB) was digested using two restriction enzymes producing fragments of
various sizes. The fragments were then randomly cloned into linearized pBluescript-SK vector
(pBS-SK(+); see Appendix 1 for vector map) and transformed into E.coli for propagation. The
inserts were amplified by PCR using E.coli colonies harbouring pBS-SK-YMZpB fragments as
templates and pBS-SK-specific primers. After confirming the presence of inserts inside the vector
by electrophoresis, samples were further processed to clean up left-over primers and deactivate the
remaining nucleotides. This step in necessary prior to set up sequencing reaction, otherwise these
elements interfere with sequencing process producing unreliable data. Next, we set sequencing
reactions using clean PCR products. The obtained sequences were then re-assembled using
computer algorithms. Since the sequenced PCR products are originated from a random insert
library make from the unknown piece of genomic DNA, multiple sequences are expected to
overlap in such a way to compose the 5 kb genomic DNA back together. Comparative analysis
using Basic Local Alignment Search Too (BLAST) and phylogeny was used to further investigate
the function of the encoded protein.

There were 9 putative ORFs identified to be encoded by YMZpB in Pseudomonas syringe,


which are involved in pathogen-related type III secretion system in the bacterium [3].

Materials and methods


DNA materials
This study aimed to identify and characterize a 5 kb unknown piece of genomic DNA, YMZpB.
The random insert library was made by cloning fragments produced by enzymatic double digestion
of YMZpB into linearized bacterial expression vector, pBluescript-SK (pBS-SK (+)). pBS-SK(+)
is designed in such a way that insertion at multiple cloning site (MCS) will disrupt intact LacZ,
disabling bacteria cells to process X-Gal into blue precipitate [4].

Enzymatic digestion
100ng (2μl) of pBS-SK was linearized in 10μl volume reaction in 200μl thin-wall 8-strip PCR
tubes containing 1μl of 10X FastDigest buffer, 0.5μl FastDigest SmaI restriction enzyme and 6.5μl
ddH2O. The solution was then gently mixed and incubated in 37° C heating block for 30 minutes.
Restriction enzyme used for this reaction, SmaI, produces blunt-ended fragments, which are
required to be dephosphorylated to prevent self-ligation.
100ng (5.4μl) YMZpB DNA wad double digested in 20μl volume reaction in the 200μl PCR tubes
containing 2μl of 10X FastDigest buffer, 1μl FastDigest EheI, 1μl FastDigest MscI restriction
enzymes and 10.6μl ddH2O. The solution was then gently mixed and incubated in 37° C heating
block for 30 minutes. Both restriction enzymes used for this reaction, EheI and MscI, produce
blunt-ended fragments containing phosphate groups necessary for cloning into linearized,
dephosphorylated.

Heat Inactivation
Restriction enzymes were heat inactivated after digestion by transferring the tubes to the 65°C
heating block for 5 minutes. Tubes were then put in ice immediately afterward. Phosphatase
reaction (see Dephosphorylation) was inactivated 10 minutes at 65° C. Ligation reactions were
similarly inactivated at 65 °C for 10 minutes. Enzymatic PCR cleaning reaction was deactivated
at 85°C for 15 minutes.

Dephosphorylation
0.5μl Shrimp Alkaline Phosphatase (SAP) was added to the solution containing linearized pBS-
SK digestion enzyme inactivation. The solution was gently mixed and incubated at 37°C for 30
minutes and deactivated as previously explained.

Cloning
EheI/MscI digested DNA fragments were randomly cloned into SmaI digested, dephosphorylated
pBS-SK vector in the 200μl PCR tubes containing 20ng of linearized vector, 75ng of YMZpB
digested insert fragments (1:3 ratio), 1X FastDigest buffer, 500nM ATP, 0.5 U T4 DNA Ligase
and ddH2O to bring the total reaction volume to 10μl. Negative control for the background
contamination was set by excluding insert fragments in the reaction. The reactions were then
incubated at room temperature for 20 minutes, and proceeded to the transformation reactions
immediately after heat inactivation.

Bacterial Transformation
100μl of chemically competent, frozen DH5α E. coli cells were thawed in ice for about 2-3
minutes. Transformation reactions were prepared by adding 10μl of the ligation mixture with
digested DNA inserts, or 10μl of negative control mixture, or 0.5μl undigested pBS-SK as positive
control into each tube of competent cells. Tubes were then incubated 20 minutes in ice, followed
by heat shock at 42°C for 50 seconds. Tubes were put back in ice for another 2 minutes. One ml
of SOC medium was added to the mixture and tubes were incubated at 37°C incubator shaking at
250 RPM for 30 minutes.

Growth condition
100μl of transformation solution was plated on Luria Broth (LB) ampR (100μg/ml) agar plates,
each spiked with 100μl from master mix containing 160μl of a 20mg/ml X-gal solution, 8μl of a
1M IPTG solution, and 232μl ddH2O. The transformation plates were incubated overnight at 37°C.
For transformation negative control, all of the bacterial culture was spread on selective medium.
For the positive control transformation, 50μl of the culture was spread on selective medium.

Colony-PCR
Colony-PCR was carried on to amplify the cloned fragments using pBS-SK-specific T3, (5’-
AATTAACCCTCACTAAAGGGAACAAAAG-3’), and T7 primers (5’-
GTAATACGACTCACTATAGGGCGAATTG-3’). Seven white colonies were picked up and
mixed with PCR solution containing 1X PCR buffer, 1.5mM MgCl2, 200μM dNTPs, 0.1 µM of
each T3 and T7 primers, 1.25U Taq DNA Polymerase and ddH2O to total volume of 25μl. PCR
reaction proceeded using thermal cycler machine (Hybaid PXE 0.2 BioRad DNA Engin) and
heating/cooling cycles as the following: 1x 94°C initial denaturation for 4 minutes, 35x 94°C
denaturation for 30seconds, 56°C annealing for 30 seconds and 72°C extension for 1 minute, and
1x 72°C final extension for 5 minutes.

Gel electrophoresis
The size of PCR amplicons was confirmed by gel electrophoresis. Five µl of each PCR product
and 1X loading dye was loaded on 1% (wt/vol) TBE agarose gel and run at 100V for 30 minutes.
5μl of 1 Kb DNA ladders (Fermentas life science) was loaded into the first well.
Enzymatic PCR product cleaning
The selected PCR products were each treated with 0.2µl of 20U/µl exonuclease and 0.2µl of
10U/µl calf intestine alkaline Phosphatase (CIAP), incubated at 37°C for 30 minutes and
deactivated as previously explained.

Sequencing PCR
3μl of each cleaned PCR reaction was mixed with 1.75μl of 5X sequencing buffer, 0.5μl of 10μM
T7 primer, 0.5μl of BigDyev3.1 sequencing mix, and ddH2O to bring the total reaction volume to
10μl. The sequencing PCR reactions were programed on the thermal cycle machine ((Hybaid PXE
0.2 BioRad DNA Engin) as, 1x 96°C initial denaturation for 1 minute, 45x 96°C denaturation for
10 seconds, 50°C annealing for 5 seconds and 60°C extension for 4 minutes, and 1x 60°C final
extension for 4 minutes. The reactions were run on ABI 3730 Genomic Analyser in order to
determine nucleotide sequence.

Sequence annotation
Obtained sequences were edited for by excluding the primer regions at the beginning and unevenly
distributed peaks and overlapping peaks at the end of sequences, respectively. The overlapping
contigs from edited sequences along with provided fragments were used to assemble one
contiguous of ~5Kb consensus sequence by Sequencher® software. Potential open reading frames
(ORFs) were identified using NCBI ORF Finder (National Center for Biotechnology information,
http://www.ncbi.nlm.nih.gov/projects/gorfl) and Gene Mark (http://exon.gatech.edu/GeneMark/).
Putative ORFs were compared with the NCBI databases using Blastp and Blastn respectively.

Phylogenetic analysis
A putative ORF, encoding the Yscj/Hrcj protein was selected for phylogenetic analysis. The 10
most similar proteins to Yscj/Hrcj in different organisms were identified using BLAST.
Phylogenetic tree was constructed using MEGA 6.06 using the neighbor joining method with a
bootstrap of 500 to generate the tree.

Results
1-Shotgun cloning
1-1- E. coli transformation
There were 1 blue and 11 white colonies grown on two transformation plates (Table 1). White
colonies are originated from E. coli trasnformants carrying pBS-SK vector with YMZpB fragment
insert. Negative control transformed with digested pBS-SK only, and positive control transformed
with undigested pBS-SK only produced 16 and around ~800 colonies, respectively.

Table 1: Trasnformants quantification. White colonies represent successful fragment cloning


into pBS-SK. Blue colonies represent the presence of empty pBS-SK (original) in trasnformants.

Ligation of Negative ligation Positive transformation


DNA fragments control control
No. white colonies 11 0 0
No. blue colonies 1 16 ~800
Percentage of blue
9 N/A N/A
colonies/total.

1-2- Colony PCR and Gel Electrophoresis


Eight PCR reactions were set up using 7 randomly selected white colonies, each served as a
template in one PCR tube, and the remaining tube as negative control with no template (Figure 1).

Figure 1: Gel electrophoresis. DNA fragments produced by PCR using 7 independent colonies
as template. Reaction 1, 3, 6 and 7 contained a band of approximately 2.2 kb while reactions 2, 4
and 5 predominantly produced a band of 3.5 kb.
2-Sequence of a 5kb genomic region
The sequence of YMZpB was assembled using a total of 69 DNA contigs edited by Sequencher®
software using 90% minimum match and 20 nucleotides for minimum overlap composing 5.4 kb
long stretch of DNA (Figure 2).

Figure 2: Generated 5.4 Kb DNA stretch using 69 overlapping contigs.

3-Genomic organization of the 5kb region


To determine putative ORFs, a consensus sequence of the 5kb region of DNA was blasted at the
protein level in the NCBI ORF Finder. Forty-four potential ORFs were found, nine of which were
determined to be significant based on their E-values and bitscores (Table 2.A). Next, GeneMark
revealed nine predicted ORFs. The putative ORFs were blasted at the nucleotide level to analyse
the ORFs annotation (Table 2.B). The genes annotations, order and directions were used to
assemble a physical gene map presented in Figure 3.
Table 2: Putative ORFs using YMZ consensus sequence. A.NCBI ORF Finder, and B.
GeneMark. Hits were selected based on E-values and bitscores.
A:ORF Finder
ORF Accession Strand/ aa
Range (bp) bp size Significant matches E-value Bitscore
number number Frame size
WP_00724847 type III helper protein HrpZ1 [Pseudomonas syringae
1 757-1764 1+ 1008 335 0 649
2.1 group genomosp.
WP_00724847 type III secretion protein HrcJ [Pseudomonas syringae
2 2211-2984 3+ 774 257 0 519
4.1 group genomosp.
WP_00724847 type III secretion protein HrpE [Pseudomonas syringae
3 3554-4135 2+ 582 193 2.00E-130 376
6.1 group genomosp.
4 AFH66566.1 2981-3517 2+ 537 178 HrpD [Pseudomonas cannabina] 3.00E-123 357
5 AFH66569.1 4421-4861 2+ 441 146 HrpG [Pseudomonas cannabina] 1.00E-100 296
WP_00724847 type III helper protein HrpA2 [Pseudomonas syringae
6 389-703 2+ 315 104 4.00E-67 208
1.1 group genomosp.
hrp regulatory protein HrpS [Pseudomonas syringae pv.
7 BAD20870.1 35-295 2+ 261 86 maculicola] 5.00E-55 179

type III secretion protein HrpF [Pseudomonas syringae


WP_00724847
8 4219-4443 1+ 225 74 group genomosp. 4.00E-44 147
7.1

WP_00576000
9 2011-2166 1+ 156 51 type III secretion protein [Pseudomonas amygdali] 1.00E-25 100
8.1

B:GenMark
1 AB795317.1 2-295 + 294 97 hrp regulatory protein HrpS 4.00E-151 538
2 AB795319.1 404-703 + 300 99 hrp pili protein HrpA 2.00E-154 555
3 AB795318.1 757-1764 + 1008 335 harpin protein HrpZ 0 1862
4 JQ517282.1 1792-2166 + 375 124 hrp associated protein HrpB 0 693
5 JQ517282.1 2175-2984 + 810 269 type III secretion protein HrcJ 0 1496
6 JQ517282.1 3116-3517 + 402 133 type III secretion protein HrpD 0 739
type III secretion protein HrpE
7 JQ517282.1 3554-4135 + 582 193 0 1075
type III secretion protein HrpF
8 JQ517282.1 4219-4443 + 225 74 7.00E-113 416
9 JQ517282.1 4430-4861 + 432 143 type III secretion protein HrpG 0 798

Figure 3. Physical map of genes encoded by YMZpB. Length of each arrow is proportional to
the size of the gene.

4-Phylogenetic analysis
The second putative ORF from NCBI ORF finder was selected for Phylogenetic analysis. The gene
encodes for a type III secretion system protein HrcJ in Pseudomonas syringe, and contains a
putative conserved domain of fliF. The FliF protein ring is thought to be part of the export
apparatus for flageller proteins, based on the similarity to YscJ proteins [5, 6]. As shown in Figure
3, the phylogenetic tree suggests the HrcJ protein has close orthologs in Serratia marcescens
(strain FGI94) and Pseudomonas parafulva (strain CRS01-1) as well as more distant orthologs in,
Pseudomonas fluorescens (strain Pf29Arp) and Pseudomonas brassicacearum (strain NFM421).

Figure 4. Generated phylogenetic tree for type III secretion protein HrcJ in Pseudomonas
syringe.

Discussion
In this study, an unknown piece of DNA, YMZpB, was analysed at the nucleotide level, and found
to encode for a set of genes involved in the type III secretion system in Pseudomonas syringe. The
module initiated by digesting YMZpB into different sized fragments, and random cloning of the
fragments into bacterial expression vector, pBS-SK using conventional ligation method. The
destination vector carrying YMZpB fragments were then transformed into E coli. competent cells
for propagation, which resulted in 11 white and one blue colonies. The blue colony on ligated
vector is indicating that an empty pBS-SK (without YMZpB fragment) has been transformed into
E coli. competent cells. This might be due an incomplete SmaI digestion which led to the presence
of the uncut pBS-SK in the transformation solution. Another possibility is incomplete
dephosphorylation of digested pBS-SK, which made it possible for the two blunt ends to self-ligate
and form the replicable structure of original pBS-SK. The digestion and dephosphorylation
reaction time can be extended to optimize the reaction efficiency for the future experiments. The
absence of white colony on positive and negative controls was as expected (see methodology).
Growing 16 blue colonies on negative control plates also refers to the presence of original pBS-
SK in the linearized, dephosphorylated vector solution, which has been previously discussed. In
contrast with a single blue colony grown on two experimental plates, 16 blue colonies grew on the
negative control plate, which was due to the volume of cell culture spread on the plate (more than
1 mlfor negative control vs total of 200 μL for bothe experimental plates). Seven white colonies
were selected on ligation plates and served as template for individual PCR reactions. Although a
greater variety of YMZpB fragments was expected to be obtained by randomly selecting 7
colonies, they predominantly reflect the present of only two fragment sizes, ~2.4 and ~3.5 kb. This
can be due to a different combination of successful digestion by the two enzymes since there are
multiple recognition sites in YMZpB for each enzyme used. Unsuccessful digestion at some
locations can occur as a result of DNA tertiary structure making the recognition sites inaccessible
by the enzyme, or in the case of double digestion, close proximity of two restriction sites that can
inhibit the function of either or both enzymes. This led to the presence of two fragments for which
the estimated size was not among theoretically expected sizes based on YMZpB map. A possibility
however, is EheI functioning to digest at its first (596) and last recognition site (4039), producing
3.4 kb size band. For the 2.4 kb band to be produced, EheI might have only functioned to digest at
1048 and 3424 locations instead. Adding to this complex is the different distribution of ladder
bands compared to the legend provided by manufacturer, making it difficult to accurately estimate
the band size.
We have set up 7 sequencing reactions, from which we have obtained partially reliable reading for
the first 4 sample. One reason for this can be the presence of contamination such as primers and
nucleotides in the reactions due to an inefficient PCR clean-up reaction. For the future
experiments, using commercial PCR cleaning kits, or conventional DNA precipitation methods,
such as ethanol or PEG precipitation is recommended to isolate desired PCR product only. Another
possibility is the presence of non-specific band in the PCR reaction that might have interfered with
the sequencing of the main product. To solve this issue, we could have optimized the annealing
temperature of PCR reactions, or gel purify the main PCR band to make sure there is no other
DNA materials that can interfere with sequencing reactions other than the main PCR product.

Sixty-nine overlapping contigs generated from sequenced YMZpB fragments were used to
assemble the full length sequence of YMZpB. The advantage of using overlapping contigs is to
have multiple readings for nucleotides that will reduce errors and strengthen the consensus
sequence data. A contiguous piece of DNA of ~5.4 kb was generated by Sequencher for the
consensus sequence YMZpB. The size of YMZpB consensus sequence differed from the expected
size of ~4.9 kb by about 10 %. However, the difference was not considered significant as
instructed. To reduce the variation between the two sizes, one possibility is to manipulate the
assembly parameters. For instance, increasing the minimum match from 90% to 95% or greater
values, would probably remove the flanking contigs with less similarity to consensus sequence at
either end of contiguous YMZpB.

Putative ORFs found in NCBI ORF Finder and GeneMark were analysed to investigate associated
genes with each ORF. Nine putative ORFs was found by GenMark, and this was in agreement
with 9 selected ORFs from NCBI ORF Finder for which the E-values and bitscores of best hits
were significantly high when blasted. The 9 putative ORFs from both databases consistently refer
to the same location in YMZpB consensus sequence, and identical set of genes when blasted at the
protein and nucleotide levels (Table 2), which increases the likelihood of accurate predicted ORFs.
The consensual putative ORFs and their location was used to generate physical map presented in
Figure 3. Based on this study, YMZpB encodes for an operon composed of set of genes involved
in the type III secretion in Pseudomonas syringe, which function to suppress host plants immune
system [3]. The function of this operon highlight its significance for pathogenicity and inducing
host response by the bacterium; therefore, it is reasonable to expect relative species encoding for
orthologs of genes involved in the type III secretion in Pseudomonas syringe. In an attempt to test
this hypothesis, HrcJ (YscJ/fliF homolog) nucleotide sequence and its top 10 hits from BLAST
were used to generate a phylogenetic tree and investigate the evolutionary history of the protein.
The phylogenetic tree (Figure 4) clustered relative species into two groups of paralogs, and HrcJ
in Pseudomonas syringe clustered with 4 HrcJ paralog in different species. This in fact supports
the hypothesis on conserved function of HrcJ in different species.
Appendix 1. Vector map pBluescript SK(+) used for bacterial expression. The vector
contain compatible sequence for T7 and T3 common primers. E coli. growth is under Amp
selection. The multiple cloning site (MCS) has been designed in such a way that an insertion will
disrupt β-galactosidase expression, resulting in growing of white colonies by transformants.
Source: "www.snapgene.com/resources"
References:

1. Anderson, S., Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids
Research, 1981. 9(13): p. 3015-3027.
2. Staden, R., A strategy of DNA sequencing employing computer programs. Nucleic acids research,
1979. 6(7): p. 2601-2610.
3. Hueck, C.J., Type III protein secretion systems in bacterial pathogens of animals and plants.
Microbiology and molecular biology reviews, 1998. 62(2): p. 379-433.
4. Horwitz, J.P., et al., Substrates for Cytochemical Demonstration of Enzyme Activity. I. Some
Substituted 3-Indolyl-β-D-glycopyranosides1a. Journal of medicinal chemistry, 1964. 7(4): p. 574-
575.
5. Allaoui, A., P. Sansonetti, and C. Parsot, MxiJ, a lipoprotein involved in secretion of Shigella Ipa
invasins, is homologous to YscJ, a secretion factor of the Yersinia Yop proteins. Journal of
bacteriology, 1992. 174(23): p. 7661-7669.
6. Suzuki, H., et al., A structural feature in the central channel of the bacterial flagellar FliF ring
complex is implicated in type III protein export. Journal of structural biology, 1998. 124(2): p.
104-114.

You might also like