Professional Documents
Culture Documents
Submitted by
MD Ashfaque Molla
Department of Botany
Dr. A.P.J. Abdul Kalam Govt. College
Kolkata- 700135
Page |2
CONTENTS
Abstract …………………………….3
Introduction………………………...4
Conclusion.................…....................27
References………...……………….28
Page |3
Abstract
This bioinformatics project focuses on the analysis of the PWL2 gene, a gene of interest in the field
of molecular biology. The aim of this study is to gain a better understanding of the structure and
evolution of the PWL2 gene. Various bioinformatics software, websites and databases are utilized to
predict the 3D structure of the protein of the given gene and validate the structure using
Ramachandran plot and evolution of PWL2 genes through closely related organisms and phylogeny
tree drawn within those closely related organisms. Nucleotide & Protein sequences were retrieved
from different organisms with maximum homology and nucleotide & protein blast was performed
between them. All retrieved sequences are aligned. The results of this project provide valuable
insights into the role of the PWL2 gene in biological processes and may serve as a basis for further
experimental investigations.
Page |4
INTRODUCTION
Many fungi possess Avirulence (Avr) genes that establish a gene-for-gene relationship with their host
plants. These genes act as unique genetic determinants, preventing the fungi from causing disease in
plants that carry corresponding resistance (R) genes. Interaction between elicitors derived from Avr
genes, both primary and secondary products, and host receptors in resistant plants trigger various
defense responses, often involving a hypersensitive response. PWL1 and PWL2 is an Avr genes of
Magnaporthe grisea.( Richard Laugé, Pierre J.G.M. De Wit, 1998)
PWL2 (for Pathogenicity toward Weeping Lovegrass (Sweigard, James A., et al, 1995)) belongs to a
gene family that includes three other putative effectors: PWL1, PWL3, and PWL4. PWL1 has also
been implicated in the incompatible reaction of M. Oryzae against weeping lovegrass. The PWL2
gene encodes a protein of 145 amino acids with a molecular weight of 16.17 kDa. An allele of
PWL2, termed a divergent pwl2 allele, was unable to confer avirulence. This allele resulted from a
guanine-to-adenine substitution, causing an amino acid change from aspartic acid to asparagine at
residue 90. The normal PWL2 gene product has the amino acid sequence DKS, while the divergent
allele alters it to NKS, which is a putative signal sequence for glycosylation. Different alleles of
PWL2 associated with virulence towards weeping lovegrass exhibit high polymorphism depending
on the isolate and geographic origin. Although field isolates with spontaneous PWL2 deletions do not
show any known fitness issues, most of the field isolates possess one or two copies of the gene. The
exact role of PWL2 in rice blast disease is unclear, but its high prevalence in rice blast field isolates
suggests a potential function during plant infection. (Were,Vincent Mbashira,2018)
The rice blast fungus (Magnaporthe grisea) was studied to understand its host specificity. Genetic
analysis identified a key gene, PWL2, which significantly affects the fungus’s ability to infect
weeping lovegrass (Eragrostis curvula). The non-pathogenic allele of PWL2 was genetically
unstable, frequently giving rise to spontaneous pathogenic mutants. PWL2 was cloned using its map
position, with guidance from large deletions found in pathogenic mutants. Transformants carrying
the cloned PWL2 gene lost pathogenicity toward weeping lovegrass but remained fully pathogenic
toward other host plants. Therefore, PWL2 functions similarly to classical avirulence genes,
preventing infection of specific cultivars of a host species. The PWL2 gene encodes a hydrophilic,
glycine-rich protein (16 kD) with a putative secretion signal sequence. In the mapping population,
the pathogenic allele PWL2-2 differed from PWL2 by a single base pair substitution that resulted in
loss of function. The PWL2 locus exhibits high polymorphism among rice pathogens in different
geographic locations. (Sweigard, James A., et al, 1995)
The ability of M. Oryzae strains to infect weeping lovegrass is controlled by PWL2, initially
identified in the laboratory strain 4360. This strain is a genetic cross between two rice pathogenic
laboratory strains. One parent strain, 4224-7-8, infects weeping lovegrass but lacks PWL2, while the
other strain, 6043, is non-pathogenic and possesses the PWL2 locus. In the genetic cross, each of the
five tetrads produced four ascospore progenies that were pathogenic on weeping lovegrass and four
that were non-pathogenic, indicating single-gene segregation for the ability to infect weeping
lovegrass. Spontaneous mutant strains lacking PWL2 were also capable of infecting weeping
lovegrass, suggesting that PWL2 determines the pathogenicity of M. Oryzae strains towards this host.
When PWL2-deficient strains were transformed with the cloned PWL2 gene, their pathogenicity
towards weeping lovegrass was lost, while they retained pathogenicity towards barley and rice
cultivars. This suggests that M. Oryzae strains did not have a general defect in their ability to infect
Page |5
plants but were avirulent on weeping lovegrass due to the presence of PWL2.(Were,Vincent
Mbashira,2018).
Bioinformatics analysis plays a crucial role in understanding the structure, function, and
evolutionary aspects of genes. In the case of the PWL2 gene, which is a species-specific Avirulence
(Avr) gene in the pathogenic fungus Pyricularia oryzae, bioinformatics tools and techniques are
instrumental in unravelling its complexities and exploring its significance. (Hernández-Domínguez,
Edna María, et al., 2020)
Firstly, bioinformatics aids in the identification and annotation of the PWL2 gene by analyzing the
genome sequence of Pyricularia oryzae. This involves using specialized algorithms to search for
sequences with homology to known Avr genes. By comparing the PWL2 gene sequence with other
related genes or proteins, bioinformatics tools can help determine its structural features. (Chen,
Chenxi. , 2013.)
Sequence analysis is another important aspect of bioinformatics in studying the PWL2 gene. Multiple
sequence alignment and phylogenetic analysis allow researchers to compare the PWL2 gene across
different strains or related species of Pyricularia. These analyses provide insights into the genetic
variations and evolutionary relationships of the PWL2 gene, shedding light on its origin,
diversification, and potential co-evolution with host resistance genes. (Zhong, Zhenhui, et al.,
2016),(Peng, Zhao, et al, 2019)
Structural prediction and modelling are additional bioinformatics approaches applied to the study of
the PWL2 gene. These methods employ computational algorithms and databases to predict the three-
dimensional structure of the PWL2 protein based on its amino acid sequence. Structural models can
provide valuable information about the protein's functional sites, ligand binding regions, and
potential protein-protein interactions, aiding in the understanding of its role in pathogenicity and host
recognition. (Jambon, Martin, et al., 2003)
FASTA format of the PWL2 nucleotide sequence of Pyricularia oryzae given below:-
Protein Sequence:
Retrieve Protein Sequence: At first complete CDS of PWL2 protein sequence of Pyricularia
oryzae organism is retrieved from NCBI website database and downloaded it in FASTA format.
The protein sequence length is 145aa.
( https://www.ncbi.nlm.nih.gov/protein/1904992618 )
Performing BLAST: Then, protein BLAST is performed in NCBI website database.
Multiple Sequence Alignment: 28 Protein sequences showing maximum homology are
retrieved. After that, multiple sequence alignment is performed with all the retrieved sequences
using CLUSTALW web server.
3D Structure Building & Validated: Previously retrieved protein sequence in FASTA format is
uploaded in SWISS-MODEL web server and then 3D structure of the PWL2 protein is build.
Then the previously made 3D structure of protein is downloaded in PDB format from SWISS-
MODEL and uploaded in PROCHECK UCLA-DOE LAB-SAVES (v6.0) to validate using
Ramachandran plot.
Phylogenetic Tree Build: Previously retrieved multiple protein sequences upload to MEGA
software (v.11). Then phylogenetic tree is build using MEGA software.
FASTA format of the PWL2 protein sequence of Pyricularia oryzae given below:-
Multiple gene sequence alignment analysis is a powerful bioinformatics tool that allows researchers to
compare and analyze the similarities and differences among multiple gene sequences. This analysis
provides valuable insights into the evolutionary relationships, functional domains, and conserved regions of
genes, leading to a deeper understanding of their structure and function.
In this study, I performed multiple gene & protein sequence alignment analysis using 24 nucleotide & 28
protein homologous sequences retrieved from NCBI BLAST. The goal was to identify conserved regions
and patterns across these sequences, which can provide important clues about their functional significance.
The alignment was carried out using widely used bioinformatics software, such as CLUSTALW web
server, which implement efficient algorithms for aligning multiple sequences. These programs align the
sequences by identifying similar residues and maximizing the overall sequence similarity.
Upon analyzing the aligned sequences, several interesting observations were made. First, I identified highly
conserved regions, indicated by residues that were identical or showed strong similarity across the
sequences. These conserved regions are likely to play crucial roles in the protein's structure or function, and
their identification can guide future experimental studies.
Furthermore, I observed variations or gaps in certain regions, indicating sequence divergence among the
analyzed genes. These variations may indicate species-specific adaptations or functional differences among
the gene products. Exploring these variations can provide insights into the evolution and diversification of
the gene family.
The multiple gene sequence alignment analysis also provided a basis for phylogenetic inference. By
comparing the aligned sequences, I constructed a phylogenetic tree to depict the evolutionary relationships
among the genes and the species from which they were derived. This tree revealed patterns of divergence,
speciation events, and possible gene duplication events, shedding light on the evolutionary history of the
gene family.
The multiple gene sequence alignment analysis provided valuable insights into the conserved regions,
variations, functional domains, and evolutionary relationships among the analyzed gene sequences. This
analysis serves as a foundation for further experimental investigations, such as functional studies and
comparative genomics, ultimately advancing our understanding of gene structure, function, and evolution.
Page |8
clustalw.dnd
(MN072510.1_1-438:0.00000,
XM_031122648.1_1-438:0.00686):0.00885,MT669815.1_726-1163:0.00028)
:0.00260, XM_031127161.1_1284:0.15837):0.00410, XM_003712998.1_1-
438:0.00018):0.00000, U26313.1_590-1027:0.00000)
:0.00014,((MG787166.1_1-438:0.00227,
XM_031132118.1_12-447:0.00691):0.00000,MG787165.1_1-438:0.00227):0.00000):0.00205,MG787162.1_1-438:0.00211):0.00000,MG787156.1_1-
438:0.00219):0.00000,MG787155.1_1-438:0.00226):0.00000,MG787158.1_1-438:0.00456):0.00000,MG787153.1_1-438:0.00000)
:0.00000,MG787163.1_1-438:0.00228):0.00000,MG787160.1_1-438:0.00228):0.00000,MG787159.1_1-438:0.00228):0.00000,MG787157.1_1-438:0.00228,
MG787154.1_1438:0.00228);
XM_031127161.1:1-284 ------------------------------------------------------------ 0
MN072510.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAGTTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
XM_031122648.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAGTTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
XM_031132118.1:12-447 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 118
MT669815.1:726-1163 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787158.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787154.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787155.1:1-438 GCCGGTGGCGGGTGGACTAACAAGCAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787156.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787157.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787159.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787160.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787162.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787163.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787153.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787165.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
XM_003712998.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
U26313.1:590-1027 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
MG787166.1:1-438 GCCGGTGGCGGGTGGACTAACAAACAATTTTACAACGACAAAGGCGAAAGAGAGGGCTCA 120
XM_031127161.1:1-284 -------------------------------------ATGGCCCTGGTCATCCTGGAGGG 23
MN072510.1:1-438 ATTTCAATTAGAAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
XM_031122648.1:1-438 ATTTCAATTAAAAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
XM_031132118.1:12-447 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 178
MT669815.1:726-1163 ATTTCAATTAGAAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787158.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787154.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787155.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787156.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787157.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787159.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787160.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787162.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787163.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787153.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
MG787165.1:1-438 ATTTCAATTAGGAAGGGCTCGGAAGGCGATTTTAACTATGGCCCCAGTTATCCTGGAGGG 180
P a g e | 12
XM_031127161.1:1-284 CGCGATGGGATGGTGCGGGTTTATGCGAACAATGGCGACATCCGCGGGATGCCTCCGCGA 83
MN072510.1:1-438 CCTGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
XM_031122648.1:1-438 CCTAATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCACGGGA 240
XM_031132118.1:12-447 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGTAACGTCCGCGGGATGCCCCCGGGA 238
MT669815.1:726-1163 CCTGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787158.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787154.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACAGCAACATCCGCGGGATGCCCCCGGGA 240
MG787155.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787156.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787157.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787159.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCAGGATGCCCCCGGGA 240
MG787160.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787162.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787163.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787153.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787165.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
XM_003712998.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
U26313.1:590-1027 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
MG787166.1:1-438 CCCGATAGGATGGTACGGGTTCATGAAAACAACGGCAACATCCGCGGGATGCCCCCGGGA 240
* ** ******* ****** *** ***** * ** ***** ******* ** **
TLD14703.1:1-138 MKLSTVILPLALGLFSNTATAAPIGYNPFSRPKWTNLKIKNDKGESEGSMSIVKGKDGTI 60
KAI6353944.1:12-134 ----NIIFAGALALFSTTISAG--------RSKWINRQIYKGN-DRQGSLSVARGYEHHC 47
XP_030981816.1:4-131 ----NIIFAGALALFSTTESAG-----ILSLPKWINKQIFIGE-DRQGSLSVARGYEDHC 50
KAI6344236.1:4-131 ----NIIFAGALALFSTTESAG-----ILSLPKWINKQIFIGE-DRQGSLSVARGYEDHC 50
XP_029744010.1:1-69 MKFSTFILPFSLGLFSEPVTA------FWGRKKWTNKNVYNEFGERTDSLSVAKGATGYI 54
XP_030981442.1:56-94 ------------------------------------------------------------ 0
XP_030987775.1:1-145 MKCNNIILPFALFFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIKKGSEGDF 51
QNS36445.1:1-145 MKCNNIILPFALFFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
XP_030977270.1:4-148 TKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79357.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79358.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79356.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79353.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79361.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79365.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79352.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
AYN79364.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
QMU24232.1:1-145 MKCNNIILPFALFFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
XP_003713046.1:1-145 MKCNNIILPFALVFFSTTVT---------AGGGWTNKQFYNDKGEREGSISIRKGSEGDF 51
QBZ53273.1:1-147 MKFNKTIPLYILAFFSTAVIA--------GGRKWTNKVIYNDKGEREGSISIRKGAEGDF 52
prf||2210377A:43-189 MKFNKTIPLYILAFFSTAVIA--------GGRKWTNKVIYNDKGPREGSISIRKGAEGDF 52
AAA80239.2:1-147 MKFNKTIPLYILAFFSTAVIA--------GGRKWTNKVIYNDKGPREGSISIRKGAEGDF 52
TLD19068.1:1-57 ------------------------------------------------------------ 0
KAI6291589.1:1-131 MKINTSILALALALLPGM-AT--------AGRKWFNKKIYDENGESAGSLSVVKGGSGYI 51
AAA80241.1:1-131 MKINTSILALALALLPGM-AT--------AGRKWFNKKIYDENGESAGSLSVVKGGSGYI 51
KAI7908610.1:1-102 MKINTSILALTLALLPGM-AT--------AGRKWLNKKLWDANGQSAGSVSIVKGGQGSI 51
AAA80240.1:1-130 MKINTSILALTLALLPGM-AT--------AGRKWLNKKLWDANGQSAGSVSIVKGGQGSI 51
XP_003711276.1:1-100 MKINTSILALTLALLPGM-AT--------AGRKWLNKKLWDANGQSAGSVSIVKGGQGSI 51
Phylogenetic tree analysis is a powerful tool in evolutionary biology and molecular genetics that allows
researchers to study the evolutionary relationships among different organisms or genes. In this analysis, the
relationships are represented graphically as branching patterns, with closely related organisms or genes
appearing closer together on the tree.
The construction of a phylogenetic tree involves multiple steps, including DNA & protein sequence
retrieving, multiple sequence alignment, and phylogenetic tree building using MEGA software. DNA or
protein sequences are retrieved from NCBI database, and run BLAST. Then those BLAST sequences are
retrieved and aligned to identify regions of similarity and difference. Sequence alignment ensures that
comparable positions are aligned, allowing for accurate analysis.
After the sequences are aligned, MEGA software is used to construct the phylogenetic tree. MEGA
consider the patterns of sequence similarity and difference to estimate the most likely evolutionary
relationships.
Phylogenetic tree analysis provides several important insights. Firstly, it reveals the evolutionary
relatedness between different organisms or genes. By examining the branching patterns and the lengths of
branches, researchers can infer the degree of divergence and estimate the time since the common ancestor.
Additionally, the analysis can help identify evolutionary events such as gene duplication, horizontal gene
transfer, or convergent evolution.
Furthermore, phylogenetic trees can assist in the classification and identification of unknown organisms or
genes. By comparing their sequences to those on the tree, researchers can assign them to a specific group or
lineage and determine their evolutionary position. (Brown TA., 2002)
Fig.1: Phylogenetic Tree of Multiple aligned nucleotide Sequence of PWL2 gene using MEGA
P a g e | 20
Fig.2: Phylogenetic Tree of Multiple aligned Protein Sequence of PWL2 gene using MEGA
P a g e | 21
The analysis of protein 3D structures through bioinformatics tools and techniques is crucial for
understanding their function, interactions, and dynamics. These analyses contribute to our knowledge
of protein structure-function relationships and aid in drug discovery, protein engineering, and the
development of novel therapeutics. The determination and analysis of three-dimensional (3D)
structures of proteins play a crucial role in understanding their molecular mechanisms and aiding in
drug discovery and design. In addition, molecular visualization tools are utilized to visualize and
explore protein structures in a three-dimensional space. These tools enable researchers to manipulate
and interact with the structure, allowing them to better understand its overall architecture and spatial
arrangement of functional residues. (Schmidt, Tobias et al, 2014)
This document lists the results for the homology modelling project "Untitled Project" submitted to
SWISS-MODEL workspace on May 14, 2023, 6:59 p.m..The submitted primary amino acid
sequence is given in Table T1.
If you use any results in your research, please cite the relevant publications:
Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T.,
de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T. SWISS-MODEL:
homology modelling of protein structures and complexes. Nucleic Acids Res. 46(W1),
W296-W303 (2018).
Bienert, S., Waterhouse, A., de Beer, T.A.P., Tauriello, G., Studer, G., Bordoli, L., Schwede,
T. The SWISS-MODEL Repository - new features and functionality. Nucleic Acids Res. 45,
D313-D319 (2017).
Studer, G., Tauriello, G., Bienert, S., Biasini, M., Johner, N., Schwede, T. ProMod3 - A
versatile homology modelling toolbox. PLOS Comp. Biol. 17(1), e1008667 (2021).
Studer, G., Rempfer, C., Waterhouse, A.M., Gumienny, G., Haas, J., Schwede, T.
QMEANDisCo - distance constraints applied on model quality estimation. Bioinformatics 36,
1765-1771 (2020).
Bertoni, M., Kiefer, F., Biasini, M., Bordoli, L., Schwede, T. Modeling protein quaternary
structure of homo- and hetero-oligomers beyond binary interactions by homology. Scientific
Reports 7 (2017).
Results
The SWISS-MODEL template library (SMTL version 2023-05-10, PDB release 2023-05-05) was
searched with for evolutionary related structures matching the target sequence in Table T1. For
details on the template search, see Materials and Methods. Overall 10 templates were found (Table
T2).
Models
The following model was built (see Materials and Methods "Model Building"):
Seq Seq
Oligo- QSQ Foun Resoluti Ran Covera Descripti
Template Identi Method Similari
state E d by on ge ge on
ty ty
AFD
A0A3G2LZF2. monom B AlphaFo 1- Pwl2
96.55 - - 0.64 1.00
1.A er searc ld v2 145 protein
h
Target
MKCNNIILPFALFFFSTTVTAGGGWTNKQFYNDKGEREGSISIRKGSEGDFNYGPSYPGGPD
RMVRVHENNGNIRGMPPG
A0A3G2LZF2.1.AMKCNNIILPFALVFFSTTVTAGGGWTNKQFYNDKGEREGSISIRKGSEGDF
NYGPSYPGGPDRMVRVHENNGNIRGMPPG
Target
YSLGPDHQQDQTDRQYYNRHGYHVGDGPAEYGNHGGGQWGDGYYGPPGEFTHEHREQR
EEGCNIM
A0A3G2LZF2.1.AYSLGPDHQENKSDRQYYNRHGYHVGDGPAEYGNHGGGQWGDGYYGPP
GEFTHEHREQREEGCNIM
Template Search
Template search with has been performed against the SWISS-MODEL template library (SMTL, last
update: 2023-05-10, last included PDB release: 2023-05-05).
Template Selection
For each identified template, the template's quality has been predicted from features of the target-
template alignment. The templates with the highest quality have then been selected for model
building.
Model Building
Models are built based on the target-template alignment using ProMod3 (Studer et al.). Coordinates
which are conserved between the target and the template are copied from the template to the model.
P a g e | 24
Insertions and deletions are remodelled using a fragment library. Side chains are then rebuilt. Finally,
the geometry of the resulting model is regularized by using a force field.
The global and per-residue model quality has been assessed using the QMEAN scoring function
(Studer et al.).
Ligand Modelling
Ligands present in the template structure are transferred by homology to the model when the
following criteria are met: (a) The ligands are annotated as biologically relevant in the template
library, (b) the ligand is in contact with the model, (c) the ligand is not clashing with the protein, (d)
the residues in contact with the ligand are conserved between the target and the template. If any of
these four criteria is not satisfied, a certain ligand will not be included in the model. The model
summary includes information on why and which ligand has not been included.
The quaternary structure annotation of the template is used to model the target sequence in its
oligomeric form. The method (Bertoni et al.) is based on a supervised machine learning algorithm,
Support Vector Machines (SVM), which combines interface conservation, structural clustering, and
other template features to provide a quaternary structure quality estimate (QSQE). The QSQE score
is a number between 0 and 1, reflecting the expected accuracy of the interchain contacts for a model
built based a given alignment and template. Higher numbers indicate higher reliability. This
complements the GMQE score which estimates the accuracy of the tertiary structure of the resulting
model.
References
BLAST
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden,
T.L. BLAST+: architecture and applications. BMC Bioinformatics 10, 421-430 (2009).
HHblits
Steinegger, M., Meier, M., Mirdita, M., Vöhringer, H., Haunsberger, S. J., Söding, J. HH-
suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics
20, 473 (2019).
Table T1:
Primary amino acid sequence for which templates were searched and models were built.
MKCNNIILPFALFFFSTTVTAGGGWTNKQFYNDKGEREGSISIRKGSEGDFNYGPSYPGGPD
RMVRVHENNGNIRGMPPGYSLGPDHQQDQTDRQYYNRH
GYHVGDGPAEYGNHGGGQWGDGYYGPPGEFTHEHREQREEGCNIM
P a g e | 25
Table T2:
Seq Seq
Oligo- QSQ Foun Resoluti Covera
Template Identit Method Similari Description
state E d by on ge
y ty
Putative
monom HHbli
2noc.1.A 27.27 - NMR NA 0.30 0.30 periplasmic
er ts
protein
Putative
homo- HHbli
2jna.1.B 22.73 - NMR NA 0.30 0.30 secreted
dimer ts
protein
Putative
homo- HHbli
2jna.1.A 22.73 - NMR NA 0.30 0.30 secreted
dimer ts
protein
Chaperone
monom HHbli
6e14.1.E 14.29 - EM 4.00Å 0.30 0.24 protein
er ts
FimC
Chaperone
monom HHbli
6e15.1.A 14.29 - EM 6.20Å 0.30 0.24 protein
er ts
FimC
Chaperone
monom HHbli
7lhg.1.B 13.33 - EM NA 0.28 0.31 protein
er ts
PapD
Validation by Ramachandran plot analysis is a widely used method in structural biology to assess the
quality and reliability of protein structures. Ramachandran plot is a graphical representation of the
dihedral angles φ and ψ of amino acid residues in a protein structure, which provides insights into the
conformational stability and sterically allowed regions of the protein backbone.
In this study, I performed Ramachandran plot analysis to validate the three-dimensional structure of
our protein of interest. But, in my work the Ramachandran plot analysis doesn’t provide strong
evidence supporting the validity and reliability of protein structure.
Fig.4: Validate 3D structure of PWL2 protein Using Ramachandran Plot using PROCHECK
NOTE: Analysing Ramachandran Plot, the 3D structure of PWL2 protein needs further structural
development with the right amino acids in those HIS 123 (A), ARG 138 (A), ASN 113(A) region.
P a g e | 27
CONCLUSSION:
The PWL2 gene of Pyricularia oryzae, a pathogenic fungus causing rice blast disease, has been the
subject of extensive research and investigation. This gene, along with other Avirulence (Avr) genes,
plays a crucial role in the gene-for-gene interaction between the fungus and its host plant, rice.
Understanding the function and mechanisms of the PWL2 gene is of great importance for developing
effective strategies to control rice blast disease and improve crop yield. This recognition and
response mechanism is essential in breeding resistant rice varieties and implementing disease
management practices.
The PWL2 gene of Pyricularia oryzae is a key player in the gene-for-gene interaction between the
fungus and rice plants. Its characterization and understanding of its avirulence function have
provided valuable insights into the mechanisms of rice immunity and opened avenues for developing
resistant rice varieties. Further research in this area will continue to contribute to the development of
sustainable strategies for managing rice blast disease and ensuring food security.
In conclusion, bioinformatics is a powerful toolset for studying the PWL2 gene in Pyricularia
oryzae. Through genome analysis, sequence alignment, phylogenetic analysis, protein structure
prediction, and functional annotation, bioinformatics provides valuable insights into the structure,
function, and evolutionary aspects of the PWL2 gene. The knowledge gained from bioinformatics
analysis enhances our understanding of the molecular mechanisms underlying pathogenicity and host
specificity in Pyricularia oryzae, contributing to the development of strategies for disease
management and crop improvement.
P a g e | 28
REFERENCE:
Richard Laugé, Pierre J.G.M. De Wit,Fungal Avirulence Genes: Structure and Possible Functions,Fungal Genetics and
Biology,Volume 24, Issue 3,1998,Pages 285-297,ISSN 1087-1845,
https://doi.org/10.1006/fgbi.1998.1076.(https://www.sciencedirect.com/science/article/pii/S1087184598910763)
Sweigard, James A., et al. "Identification, cloning, and characterization of PWL2, a gene for host species specificity in the rice
blast fungus." The plant cell 7.8 (1995): 1221-1233.
https://www.researchgate.net/publication/15648597_Identification_Cloning_and_Characterization_of_PWL2_a_Gene_for_Host_
Species_Specificity_in_the_Rice_Blast_Fungus
Were, Vincent Mbashira. "Investigating the role of effector proteins in the rice blast fungus Magnaporthe oryzae."(2018)
https://ore.exeter.ac.uk/repository/handle/10871/33115
Hernández-Domínguez, Edna María, et al. "Bioinformatics as a Tool for the Structural and Evolutionary Analysis of Proteins."
Computational Biology and Chemistry; IntechOpen: London, UK (2020): 37-64.
https://books.google.com/books?hl=en&lr=&id=EpYtEAAAQBAJ&oi=fnd&pg=PA37&dq=Hern%C3%A1ndez-
Dom%C3%ADnguez,+Edna+Mar%C3%ADa,+et+al.+%22Bioinformatics+as+a+Tool+for+the+Structural+and+Evolutionary+A
nalysis+of+Proteins.%22+Computational+Biology+and+Chemistry%3B+IntechOpen:+London,+UK+(2020):+37-
64.&ots=TZNJmaerFa&sig=_OSzx97Cu8En9OK9RUH5wYG1SIY
Chen, Chenxi. Analysis of the molecular basis of virulence in pathogenic fungi. The Ohio State University, 2013.
https://search.proquest.com/openview/6d64b434112fe3defb5839df53b8740a/1?pq-origsite=gscholar&cbl=18750
Zhong, Zhenhui, et al. "Directional selection from host plants is a major force driving host specificity in Magnaporthe species."
Scientific reports 6.1 (2016): 25591. https://www.nature.com/articles/srep25591
Peng, Zhao, et al. "Effector gene reshuffling involves dispensable mini-chromosomes in the wheat blast fungus." PLoS genetics
15.9 (2019): e1008272. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008272
Jambon, Martin, et al. "A new bioinformatic approach to detect common 3D sites in protein structures." Proteins: Structure,
Function, and Bioinformatics 52.2 (2003): 137-145. https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.10339
Schmidt, Tobias et al. “Modelling three-dimensional protein structures for applications in drug design.” Drug discovery today
vol. 19,7 (2014): 890-7. doi:10.1016/j.drudis.2013.10.027 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112578/